Overview

The development of walking controllers for legged robotics is a complex process that involves careful modeling of the system and robust controller design. If the model of the system is not perfect, non-negligible uncertainty could be introduced into the system, which can result in instability. To counteract this effect of uncer- tainty, stochastic methods, namely reinforcement learning (RL), have been used to train legged robots to achieve stable gaits. Many of these RL methods lie in the domain of imitation learning, but this may be constricting to the reinforcement learning policy’s exploration process. In addition, a high quality reference tra- jectory must be first defined before the RL training process, and this process can be tedious. On the other hand, model-free RL methods give learning agents the freedom to explore their state spaces, but the resulting gaits may not be optimal nor natural for sim-to-real transfer. Recent work created an intuitive way to design reward functions to guide a model-free RL agent to learn a spectrum of common bipedal gaits. This work balanced the constraining, but well-specified, method of imitation learning and freeing, but under-specified, method of model- free RL. Since quadrupeds are close relatives of bipeds, there is a natural curiosity to see if this framework would work for quadrupeds. This Masters project answered this question.

Results

Half Cheetah Pronking with PPO
Half Cheetah Galloping with PPO