Build a Balancing Bot with OpenAI Gym, Pt I: Setting up

When I started working with OpenAI Gym, one of the things I was looking forward to was writing my own environment and have one of the available algorithms derive a model for it. Creating an environment is not obvious, so I had to go through some experimentation till I got it right. I decided to write a tutorial series for those that would like to create their own environments in the future, by taking an example of a task that is both fun and simple, but also extendable: A balancing bot.

This post assumes that you have some understanding of Reinforcement Learning principles. Specific understanding of Deep-RL algorithms is not necessary, although it wouldn’t hurt. Also, it is good if you have some familiarity with Python and tools such as pip, Miniconda and setuptools.

This is part I of the series, covering setting up the OpenAI Gym environment. The second part can be found here.


We will be using OpenAI Gym to implement the Balancing Bot task. In addition we will be using Baselines and pyBullet. Baselines is an OpenAI project that includes implementations of several state of the art reinforcement learning algorithms. Using Baselines will allow us to focus on creating the environment and not worry about training the agent. In addition, since our environment is defined by physics, we will be using pyBullet to perform the necessary computations and visualize experiment progress. pyBullet is easy to setup and use and includes a viewport module that has lots of visualization and interaction features built in.

At the end of this two-part tutorial we will be able to train a balancing bot controller with just a few lines of code. As an example, below is the code that we’ll use:

If the above code doesn’t make much sense right now, hang on. It will become clear at the end of this post.

Setting up a Python Environment

I recommend setting up a Miniconda environment prior to working with projects such as this one. Once you have Miniconda installed, you can create a new Python environment and switch to it as follows:

Once inside the Miniconda environment you created you can install packages using pip as usual. You can go ahead and install some basic packages that will be used in this tutorial:

Structuring the balancing bot project

Next step is the creation of the Gym environment structure. OpenAI have included an informative guide to the folder structure of an environment, which we will be following in this tutorial. Our folder structure will look like below:

Go ahead and create this structure in your project folder. Once you are finished, it’s time to start creating the individual files.

Read also:  First Autonomous Indoor Laps – No Tracklines!

balance-bot/ includes the following:

Mostly standard setuptools stuff. Notably the baselines package is missing from the requirements. This is because the environment itself is independent from the agents, so that anyone may choose to use a different agent to solve the problem represented by the environment.

Next, balance-bot/balance_bot/ includes the following:

Here the use of register() calls for some clarification. In our first code block above, we used the gym.make() function to instantiate our environment, and later on pass it to the training function. gym.make() accepts an id (a string) and looks for environments registered with OpenAI Gym that have this id. If it finds one, it performs the necessary instantiation and returns a handle to the environment. All further interaction with the environment is done through that handle. Without going in too many details, the register() function above is what tells Gym where to find the environment class and what name it should give it.

The entry_point parameter specifies the subclass of the Env class that describes our environment. Implementation of the class will be the subject of the second part of this tutorial series. Note that the entry_point parameter value has a specific format: {base module name}.envs:{Env subclass name}. Even though BalancebotEnv resides within another module, Gym doesn’t need to know about it because the class is imported at balance_bot/envs/

At this point, Gym expects the id of our environment to follow the pattern {name}-v{version}. This is clearly a decision to help versioning of environments, so we’ll just stick to it for now.

Next comes balance-bot/balance_bot/envs/

As mentioned earlier, this file simply imports BalancebotEnv from the corresponding Python module, so that it is available to the register() function above.

Finally, the balance-bot/balance_bot/envs/ file is the one that will contain the actual environment class. Below is a stripped-down version of the file contents, which we will be fleshing out later on:

At this point we are done with preliminary environment setting up, and we can start fleshing out our Env subclass. Before doing that, though, let’s try and do a pip install in our newly created environment. From the root of your project:

This should install the environment in editable mode, which means that changes you make to your files inside balance-bot will affect the installed package as well.


This was the first in a tutorial series that shows how to create a custom environment for reinforcement learning, using OpenAI Gym, Baselines and pyBullet. In this post we discussed structuring of the project environment and files. Our environment is not yet functional but take a look at the video below for a glimpse on what we’ll be building in the end:

Go ahead and visit the second part of this tutorial.

Have any questions or comments? Share your experience in the comments below.

Sign up to Backyard Robotics

Enter your email address to receive updates on exciting new experiments, tutorials and how-to's:

Posted in AI

Leave a Reply