A Winch for Reinforcement Learning


Reinforcement Learning Pain When Training a Real Robot

Reinforcement learning often takes thousands, or hundreds of thousands of episodes before the neural network is trained well enough to accomplish a task. We are getting better at methods to reduce the number of episodes, but for now its still a laborious process. When working with a real robot (vs a simulation), the robot must be reset in between each episode. This usually involves picking up a fallen robot, or moving a robot back to a specific position, so it can try again. Even if we put aside the damage that can happen to an untrained robot during these episodes, the process of repeatedly setting up the robot for each episode is tremendously time consuming.

The Fantasy

The fantasy solution is that I’m off on vacation while my robot is training itself, 24/7, without any need of a human to help it. I figured it was worth trying to make this fantasy come true. So here is that adventure.

The Winch

I built a winch for Beaker, my two-wheeled balancing robot. The winch had two main jobs, equally important:

  1. Protect the balancing robot so when it falls, it does not fall all the way down.
  2. Reset the robot for the next episode.

Quick Note on Beaker and The Winch communication

The robot (Beaker) has a raspberry Pi running Tensorflow, and an Arduino for low-level commands to the hardware. When an episode is complete, events are as follows:

  1. The Raspberry Pi tells the Arduino to issue a wireless command to the winch: “Reset me, please!”
  2. The winch lifts Beaker in the air, waits a moment, then slowly places Beaker back down on the ground. At this point Beaker is completely upright, wheels barely touching the ground.
  3. Once Beaker is reset, the winch says, “You are all good, try again!”
  4. Beaker tells the winch, “give me slack!”, and then immediately starts the episode, trying to balance.
  5. Eventually the episode ends, with Beaker falling, or the episode timing out if Beaker balances long enough. At this point, goto step 1.



I’ll talk about each component:

  1. Arduino: Standard Uno. I used it because I had a spare one laying around. Could have been anything.
  2. Circuitry: Going to talk about what’s in the box below. In brief it houses the receiver and motor controller.
  3. Nema 23 Stepper Motor: I started with a stepper because originally I thought I could dead-recon how much I needed to lift. As in, I thought I could measure how much I needed to lift the robot, record that, and just lift that same amount each time. This was silly of me. Though I’d already purchased the motor, so its what I used. More on this below.
  4. Pulley: Yeah, its a pulley. The cable is a nylon cable that’s really strong and not elastic.
  5. 5kg Load Cell: This is how I solved the problem of knowing when to stop lifting, and when the robot was safely back on the ground.
  6. Pulley’s hanging off Load Cell: This is how I transferred the weight from the main drive shaft to the load cell.

The Circuitry


  1. Load Cell Circuit. This came with the load cell. Simple to hook up. There are lots of tutorials out there.
  2. Recceiver. These little cheap transceivers are fantastic.
  3. Motor Driver.

Power Supply

I wanted a single power supply to power: Beaker and the Winch. So I purchased a 12 V, 5Amp power supply. For the Winch, the power supply went directly into the motor driver, and split off to go directly to the Arduino Uno. (12V is the upper limit for the Uno)

Why A Load Cell?

The winch must know when to stop lifting, and once the robot is in the air, it must know when to stop lowering. There are lots of ways I could have done this. External cameras, switches, scales on the floor, sensors on the robot, etc… I went with a load cell because:

  1. It’s very easily adaptable to other robots. Just enter the robots weight into a constant and you are done.
  2. Easier than alternatives, and works just as well. A camera, or a switch, or something else would have worked, but the load cell works just as well, and is so easy to set up.

The Code

The code is here. It’s pretty straightforward. If you have any questions about it, ask me 🙂


Ultimately I never got Beaker training via tabula rasa reinforcement learning, using the winch 😦 The reason I believe this is the case, is that even though the winch was good at its job, when it set Beaker back down on the ground, there was a bit too much variation on Beaker’s position at the start of the episodes. I believe given enough episodes Beaker would have learned to balance using this setup.