|
|||||||
|
|||||||
|
|||||||
|
|||||||
|
|||||||
A-RMA Results |
|||||||
A-RMA Method![]() The first two phases are the same as RMA. In the first phase, the base policy takes as input the current state, previous action and the extrinsics vector which contains a compressed version of the privileged environmental factors. The base policy is trained in simulation using model-free RL. In the second phase, the adaptation module is trained to predict the extrinsic from the history of state and actions via supervised learning with on-policy data. We additionally add a third phase in which the base policy is fine-tuned again with PPO while keeping the adaptation module fixed, to account for imperfect estimation of extrinsics. We found this to be critical for reliable performance in the real world. |
|||||||
|
Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik RSS 2021 Pdf | Video | Project Page |
Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak CoRL 2021 Pdf | Video | Project Page |
Coupling Vision and Proprioception for Navigation of Legged Robots Zipeng Fu*, Ashish Kumar*, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak Pdf | Video | Project Page |