Building a self driving car from scratch!

Arunesh Mishra
Dec 16, 2020
4 min read

Updated: Dec 17, 2020

In this post, we will learn to develop a fully autonomous car using Udacity's Car Simulator using python. I'll describe various approaches for preprocessing the dataset obtained from the simulator. Further, I will be recreating NVIDIA's End-To-End Self Driving neural network framework from start to train the model. I have tried to make this post as less technical as possible so that even a general audience can make sense of it and enjoy it. Before we move forward, here is what the final product looks like! Enjoy it :)

I have divided this post into 4 parts to make it easier to understand.

Obtaining the dataset
Pre-processing the dataset
Building & Training the NVIDIA Model
Testing our car!

Platform: I would recommend using Anaconda's Spyder for a smoother process. The reason being Spyder has code editor as well as the python console in the same window which I find very easy to use. It also has a window for file explorer that lets you easily collect and mange data while writing code. P.S. It contains most of the packages that required for building AI/ML models.

Let's get started!

Obtaining the Dataset

I used Udacity's car Simulator (training mode) to gather dataset. You can download the software from GitHub. I little bit about the Simulator - the car is driving on of the two tracks available (1st track used for this project), and there are 3 cameras mounted on the front left, center, and right side of the car respectively. The simulator has a record button on the top right which starts recording the data frame by frame.

Left Center Right

For every single image (X sample), there is a corresponding steering angle associated with it (Y sample). Once the setup was ready, I played around in the training mode to get myself familiar with the throttle and steering. For this project, I recorded 1200 images to train the model. While 1200 may seem a small number for self driving car project, it worked out for a beginner project. Moreover, the more images we use to train on, the more processing power it takes. So based on my laptop specs, I had to make this trade-off.

Pre-Processing the Dataset

Optimization is a big part when it comes to building deep learning models, and this becomes very important especially where images are involved. In this project, for instance, we can crop the sky from the image as it doesn't help us in the training part. Further, we can convert the entire cropped (colorful) image in YUV. YUV color spaces are more efficient coding and use less bandwidth. This saves us a lot of processing power and memory. Again, these are some of the trade-offs that Data Scientist usually make for saving resources!

Resultant preprocessed image

Building & Training NVIDIA Model

I used NVIDIA's End-to-End Deep Learning model as an inspiration for this project. The full blog is available at this link. The model is based around the PilotNet architecture which is composed of nine layers:

Five Convolutional Layers. These layers, which form a Convolutional neural network (CNN), play a big part in computer vision, namely in the training of features using images as input.
Three Dense layers, and a normalization layer

Here is the code snippet:

You will notice that I used 'elu' (not 'relu') for the all the layers. The reason being 'relu' only outputs values from 0 to max, meaning it filters out any negative values that is fed to the function. If you noticed in the first video right side, the steering angle have negative values! Further, I also added Dropout layers to the model with 50% value, meaning half of the total nodes will be turned off during training. Again this saves time and processing power.

 def nvidia_model():

  model = Sequential()

  model.add(Conv2D(24, 5, strides=(2, 2), input_shape=(66, 200, 3), activation='elu'))
  model.add(Conv2D(36, 5, strides=(2, 2), activation='elu'))
  model.add(Conv2D(48, 5, strides=(2, 2), activation='elu'))
  model.add(Conv2D(64, kernel_size=(3, 3), activation='elu'))
  model.add(Conv2D(64, kernel_size=(3, 3), activation='elu'))
  model.add(Dropout(0.5))

  model.add(Flatten())

  model.add(Dense(100, activation='elu'))
  model.add(Dropout(0.5))
  model.add(Dense(50, activation='elu'))
  model.add(Dropout(0.5))
  model.add(Dense(10, activation='elu'))
  model.add(Dropout(0.5))

  model.add(Dense(1))
  model.compile(Adam(lr = 0.001), loss='mse')

 return model

Loss results from training - the loss was less than 0.1% showing that the model performed well during the training as well as validation phase.

Testing out the model!

Once the model was trained and saved, a client server connection was made to connect the python console with the Udacity simulator using Flask and IO Socket (code available on GitHub). This part of project was very new to me as well as I had never worked with Flask and Socket IO, so most the code was inspired from Stack overflow. Once the the connection was made, the simulator sent images in real-time images to the python console where the image was preprocessed like we did in the training process. The trained model then made predictions for the steering angles, and finally these steering angles were sent back to the simulator in real-time which drove the car!

Improvements for Future project

Data augmentation is a technique to artificially create new training data from existing training data. This means, variations of the training set images by tilting them, resizing them etc. In real world application, we might not have large datasets available for many reasons, therefore data augmentation can come in very handy.
Other improvements also include increasing the number of epochs (30 epochs were used for this project which took approximately 6 minutes to train), getting a better machine with higher processing power, gathering better pixel quality images, for instance 1400x1050, for the datasets (800x600 used in this project).

Full Code available here