|
|||||||
|
|||||||
|
|||||||
|
|||||||
|
|||||||
A-RMA Results |
|||||||
A-RMA MethodThe first two phases are the same as RMA. In the first phase, the base policy takes as input the current state, previous action and the extrinsics vector which contains a compressed version of the privileged environmental factors. The base policy is trained in simulation using model-free RL. In the second phase, the adaptation module is trained to predict the extrinsic from the history of state and actions via supervised learning with on-policy data. We additionally add a third phase in which the base policy is fine-tuned again with PPO while keeping the adaptation module fixed, to account for imperfect estimation of extrinsics. We found this to be critical for reliable performance in the real world. |
|||||||
|
Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik RSS 2021 Pdf | Video | Project Page |
Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak CoRL 2021 Pdf | Video | Project Page |
Coupling Vision and Proprioception for Navigation of Legged Robots Zipeng Fu*, Ashish Kumar*, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak Pdf | Video | Project Page |