What is Transfer Learning, Briefly
Transfer learning is basically a brain transplant – taking a neural net that was trained for one task, and applying it (or transferring it) to another task.
Goal of this Project
I made a real balancing robot named Beaker. Wouldn’t it be cool if I could build an accurate model of Beaker in simulation, train a policy in the simulation, and then transfer that policy to the real robot?
Inspiration came from watching this short video, which demonstrates transfer learning from a virtual to real quadruped. Here is the corresponding paper. This paper gets a lot of credit, and I certainly took some queues from it, like methods to better model latency and model motors. Thanks guys!!!
Building the Model
First up, I need to build the Beaker model. This includes the physical characteristics of the robot (dimensions, mass, inertia) and the joints (in Beaker’s case its really just the wheels until I also model the motors and bevel gears).
Note: The more realistic the model (and environment) is, the better the transfer learning will work.
Dimensions were taken (again):
I decided to build the model in the Universal Robot Description File format (URDF), which is just XML, and a common format when describing robots. I created a (very basic) URDF file of Beaker. From there I wanted to view the URDF model. There are a few apps for this for the mac. I found a cute simple open-sourced one called urdf-viz.
URDF is verbose and certainly not intuitive. After a little while I had the above model.
Building the Environment
Now I need an environment in which to train my virtual Beaker. An “environment” means a combination of a physics engine, a virtual space for your robot to train, and some rule sets to know when your virtual robot succeeds or fails.
pybullet comes with some standard OpenAI Gym environments, and its (relatively) straightforward to take the classic inverted pendulum environment, and swap out the model with a more realistic Beaker model.
The files needed to accomplish this were:
– the URDF file (duh)
– a BeakerBot class which wraps the URDF file
– a Beaker OpenAI Gym Env which loads BeakerBot
Tip: For more details on how to build your own environment, check out this post. Unfortunately I found it only after I’d done most of the work here, but its a fantastic reference, and it uses a balancing bot too! The only drawback is that it does not make use of the included env_base and robot_base classes that come with pyBullet, and they are cool.
Neat, we now have a world in which we can try to teach virtual Beaker something. Of course, he has no real brain yet, so when you hit “play” in the environment, he just falls over (poor guy):
Challenge: Crossing the Reality Gap
Its easy to train a virtual robot. Getting the trained network to work on the real robot is a different matter, and its success depends on the fidelity of the virtual robot. Fidelity in this case refers to things like proper dimensions, mass, rotational inertia, motor behavior, etc… All of these characteristics of the real robot have to be reflected in the virtual robot accurately enough, otherwise the neural network trained in the virtual robot wont work for the real robot.
How NOT to get the model right
So, getting the model right is a challenge. How to go about it? At first I figured I’d just try this iteration cycle:
- train virtual model
- see network not work on real model
- try new model characteristics based on how robot failed.
- goto step 1
Looking at that loop gave me a sense of dread. Step 1 was fast (fast-ish), but step 2 seemed cumbersome. Worse, I had no idea if step 3 even made sense. How would I know which characteristics were not realistic enough? Ultimately I might have to do this, but first I need to think about all the ways I can get insight into the robot’s physical characteristics and model them.
How to get the model right — using PID!
I remembered I already had the real Beaker robot balancing pretty well, using our good old friend, the PID controller. I also had the PID values I used on the real robot. In Beaker’s case, I turned out to use a bit more than a standard PID controller, but thats not important right now — the point is I could take the same algorithm (basically PID), and take the same constants that made the real Beaker balance, and tweak the virtual model until the algorithm worked on the virtual Beaker, using the same constants. NOW the iteration cycle looks like this:
- see if virtual Beaker balances using PID constants from real robot
- see virtual Beaker fall.
- try new model characteristics
- goto step 1
Notice the real robot is not involved in this iteration cycle 🙂 So the hope is, once this process yields a balancing virtual Beaker using PID and the same constants as the real robot, my model will be accurate enough to then go in the OTHER direction: training a neural net on the model, and transferring it to the real robot. Fingers crossed!
Utilizing Existing PID Data
A while ago, I tuned the low-level PID controllers for Beaker’s motor drivers. This way, a high-level balancing algorithm could issue a rad/s command and not worry about how that gets executed. I wrote up how I tuned the low-level PID controllers in this blog post, which is cute because you can see Beaker in training wheels. So now I’m wondering what happens when I issue a rad/s command in pyBullet. How quickly do the wheels reach the desired speed? Ideally, the speed matches the real Beaker’s speed, seen here:
The above chart is from the blog post I linked above. Now I create a similar virtual “training wheel” setup for the virtual Beaker and lets watch the rise time:
Beaker has an inline gear-train in its motors as well as bevel gears (which translate the power 90 degrees). This yields something called backlash, which is basically loose “play” between the wheel and the motors. I can turn the wheels about 0.1512 rads (8.663 degrees) back and forth before feeling any resistance from the motor. Thats significant: Imagine the the last command was forward 10 rad/sec. The next command is -5 rad/sec. This change in wheel velocity direction will be registered by the wheel’s encoders right away, but the robot’s accelerometer won’t register the effect until a few loops later, once the wheels have “caught up” with the motor. I needed to model this backlash, and after some tinkering around I figured it out. See details and code here.
Demo of an orange disk that follows the green disk with 0.1512 rads of backlash:
Beaker use two processing units: an Arduino Mega for low level control, and a Raspberry Pi for high level control. We want to measure (and model) the amount of time it takes for the Pi to issue a wheel velocity change and see that result in an observation is X. Here is the script that starts the wheels spinning and tests how long it takes to see an observation with moving wheels. The result is an average of 44ms, with a range between 42 and 48ms. This makes sense as the Arduino Mega only listens for commands from the Raspberry Pi once every 20ms. By default, issuing a velocity change, and getting an observation in pyBullet are instantaneous. Thats no good! There are two things that need to be done.
- The instantaneous velocity change is due to using the
set_velocity()pybullet command. To fix this, I need to model the motor better. Lets push that off until the next section since its not purely a latency issue.
- As far as instant observations, I’ll follow the lead of the paper I linked above, and keep an array of past observations. Instead of using the most recent one, I’ll use one that lines up with the one I would have seen given latency.
Modelling the Motors
Sigh. It would be nice if I could just use the
set_velocity command on the model. Unfortunately, thats an instantaneous change. There are two things I can try. The first is to model a DC motor by figuring out various constants that describe the motor. The second way, and the way I’d like to try first, which I think is easier, is to just try and emulate what the real robot’s PID controllers do. Their job is to maintain a rotational velocity. So rather than issue the final velocity in simulation, I can issue small changes in velocity that yield a curve similar to the one in the PID chart shown above.
PID Results in Virtual
Above, we have Beaker balancing via PID. The problem is that there is a big discrepancy between the PID values in virtual and in real life:
Now we have a model that seems pretty accurate. Woohoo! The last step is creating a client that will train a neural network.
I mostly use Tensorflow. The environment I created above is a continuous action-space environment, so I decided to go with the DDPG algorithm, and the good folks at OpenAI have conveniently created the baselines repo, which includes the DDPG algorithm, so it was pretty straightforward to apply DDPG to my virtual beaker bot. Even though DDPG is fancy-schmancy, its still a tabula rasa effort, so it took a few thousand episodes to get good performance. I’m sure there are tweaks I could have done to get training happening in fewer episodes but thats not the point of this current exercise.
Here is how virtual Beaker looked after X episodes:
Brain Transplant Time
At last its time for the actual transfer part of transfer learning! This is basically taking the in-virtual trained neural network and placing it into Beaker and seeing if Beaker balances…