Neural Network Powered Physics Engine

TL;DR: Did not get this to work very well. Still glad I did it.

The Problem

If you want to control a real-world (as opposed to virtual) robot with a neural network, then you need to train that network. There are a few ways to do that, each with their pros and cons.

  1. If you train the real-world robot from a brand new network (tabula rasa) using reinforcement learning, then the robot is going to flop around, potentially hurt itself, be a burden to reset, and importantly take a really long time to train.
  2. If you train the network in virtual and then try to transfer the network to the real robot, then you are in for a world of pain trying to make sure the virtual robot model (and physics engine) are realistic enough such that the neural network, when transferred into the real robot, actually controls the real robot successfully.

So what to do?

Often we look to neural networks for robotic control so we can outperform some existing hand-crafted control system. That control system might be as simple as PID, or something a bit more hefty like LQR, or something really complex and esoteric. The point is, before turning to neural networks, people usually have a baseline with which to compare.

What can we do with that baseline? How can we bootstrap from it? How can we boostrap from it… a lot??

Hit record, check Reddit for a few minutes, stop recording

If you have an existing (non-neural network) control system, that means you already have access to the robot’s state, and of course you have access your existing control system’s output. So you can create a log that stores the state and action:

s1a, s1b, s1c, ... s1n,a1
s2a, s2b, s2c, ... s2n,a2

In the above lines, we assume the state is comprised of four parameters, with a single scalar as the action.

Now, what if we decided to train…a physics engine! This is not a new idea. We reconfigure the log output from above so that the input to the NN is

Input: (s1a, s1b, s1c,...s1n, a1)
Expected Output: (s2a, s2b, s2c,...s2n)

Now we can just do some supervised learning. Of course, the quality of the physics engine is only based on the quality of the recorded data. So that means you need to take your real robot and record it doing both the things you want, like walking, as well as totally random things.

Why try to make a NN powered physics engine, based off your existing robot? Because then, as long as you have enough data to train on, the NN will yield a physics engine thats amazingly accurate. No more tinkering around with things like the Bullet Physics Engine (which is awesome, btw).

To be clear, I’m not trying to create a full physics engine. I’m just trying to create a an engine that can predict a single robot moving on a flat surface.

Does it Work?

Even though its sort of cheating, for my first experiments I used a virtual balancing robot with PID controller to record data, and tried to create a physics engine fro that. I used that recorded data to train a NN as explained above. Here are some different formulations of the network, with associated losses (and validation losses):

model_world_h32x2.L3_b32_e300_0.009538_.h5.png
Hidden:32 Depth: 3

Above you can see just a tad of improvement of loss over validation loss. Prob acceptable.

model_world_h64x3.L3_b32_e500_0.009604_.h5.png
Hidden: 64, Layers: 3

Above we see way too much divergence. This means we have overfitting.

model_world_h64x2.L3_b32_e500_0.009643_.h5.png
Hidden: 64, Layers: 2

Above we can see even when we jump down to two layers, there is a lot of overfitting.

model_world_h16x4.L3_b32_e300_0.011416_.h5.png
Hidden: 16, Layers: 3

Also seems comparable to H32D3. How do they perform?

So-So. Here I am running an environment, but instead of using the bullet physics engine, I’m using a trained NN. The agent is applying no power to robot, so the balancing robot just falls over. The fact that it falls over as seen below is pretty neat.

recording_of_pe.gif

While promising, it turns out that getting the NN to compete with a physics engine even on a simple task like the above is not very good. I think that something like this might be good for rough planning, but not good for training.