

REINFORCEMENT LEARNING
INTRO
For my final project in CS4100 I tried to make a reinforcement learning agent for playing Atari 2600 Frogger. I am using Q-learning with a deep neural network to keep track of the Q-function. The neural network has 4 total layers: the input layer (sampling of the game screen), two hidden layers of size 128 and 64, and the output layer of size 5. The output layer is of size 5 because there are five actions a player can take: not moving, moving up, moving down, moving left, and moving right. I am also using the Arcade Learning Environment for interacting with the game and receiving the rewards for the actions performed.
One of the problems I ran into early on in my training was the lack of rewards for performing actions. Especially in the first half of the game with the road, the only reward given is for moving forward which hinders the ability for the agent to learn. I tried a variety of different rewards to improve the agent.

REINFORCEMENT AGENT USING NEURAL NETWORK

INPUT: Sampled Screen

OUTPUT: Five Actions (NOOP, UP, DOWN, LEFT, RIGHT)
EXPERIMENTS
I conducted five experiments to find the reward system that gave the best results.
#1 (Regular rewards)
For the first experiment, I used the rewards straight from the game. These rewards could be directly collected from the Arcade Learning Environment and used for fitting once the round was over. In this experiment, I used 1000 games to train the network and the agent was able to score alright but it wasn’t learning much about the road section of the game.

AVERAGE SCORE OVER 50 GAMES: 12.80
#2 (Additional rewards for specific actions)
In this experiment I used the same rewards given in the game but also added a few extra rewards that would help the agent receive more feedback during the road section of the game. Some of the rewards added were a bonus for getting past the road section, a large negative reward for getting hit by a car, a small negative reward for not moving to encourage the game to make progress and a negative reward for moving backwards. I ended up training with another 1000 games for this experiment and the results were interesting. After this experiment I also realized that the agent was exploiting the intro of the game to gain extra points since the action of moving forward would receive a reward even though the frog wasn’t actually moving.

AVERAGE SCORE OVER 50 GAMES: 12.56
#3 (Same rewards but fixed cheating)
This experiment was very similar to the second experiment in regards to rewards. The rewards didn’t change but the window of data collection was changed to prevent the cheating. This involved adding a check to make sure the intro was over before using the data collected. After training this model with 500 games the results were starting to show some promise. The results didn’t really improve with the changes.

AVERAGE SCORE OVER 50 GAMES: 12.24
#4 (Small reward improvements)
This experiment was similar to experiment 3 but with fine tuning of the rewards given for certain actions. The updating of the actions in the Arcade learning environment didn’t happen every frame so some rewards were still given out for actions that shouldn’t have gotten them. With this experiment those extra rewards were corrected. This model was trained using 750 games and had more promising results.

AVERAGE SCORE OVER 50 GAMES: 11.94
#5 (Cropped screen to just the road + Sampling)
In this experiment I cropped the screen down to just the road portion and increased the sampling to get a similar input size but more detail for crossing the road. I also removed the negative rewards for staying put and moving backwards. I did keep the rewards for getting hit and for getting past the road section. With this experiment I really wanted to get the agent to do the road portion well. I ended up training the agent with 2000 games but got similar results to the rest of the experiments.

AVERAGE SCORE OVER 50 GAMES: 13.00
SAMPLE PLAYTHROUGHS




RETROSPECTIVE
This project definitely was very interesting and I think I learned a lot from this experience. I didn’t realize training a reinforcement learning agent to play Frogger would be this difficult. I originally wanted to get at least one frog to return home but even getting the frog across the road ended up being a challenge. I learned that especially with reinforcement learning the amount and type of rewards can have a huge impact. With Frogger having little to no rewards in the first half of the game it was crucial to add in additional rewards to improve the agent. I also learned that it takes a lot of trial and error to find the optimal rewards and parameters to use when teaching the agent. Thinking outside the usual gameplay helps when figuring out the added rewards to direct the agent to more desirable outcomes.
On a different note, it was a little frustrating with the Arcade Learning Environment because there isn’t an easy way to get the next state without already transitioning to it which made the future rewards unable to be calculated. I also noticed that with Frogger the screen has a flashing effect where some objects in the scene appear one frame but disappear the next frame. I’m very confident that this had some effect on my results.
