Qlearning - Defining states and rewards

I need some help with solving a problem that uses the Q-learning algorithm.

Problem description:

I have a rocket simulator where the rocket is taking random paths and also crashes sometimes. The rocket has 3 different engines that can be either on or off. Depending on which engine(s) is activated, the rocket flies towards different directions.

Functions for turning the engines off/on is available

enter image description here

The task:

Construct the Q-learning controller that will turn to rocket to face up all the time.

A sensor that reads the angle of the rocket is available as input.

My solution:

I have the following states:

enter image description here

I also have the following actions:

all engines off
left engine on
right engine on
middle engine on
left and right on
left and middle on
right and middle on

And the following rewards:

Angle = 0, Reward = 100 All other angles, reward = 0

Question:

Now to the question, is this a good choice of rewards and states ? Can I improve my solution ? Is it better to have more rewards for other angles ?

Thanks in advance

Upvotes: 3

Answers (2)

user2570223

Reputation: 31

Try putting smaller rewards on the states next to the desired state. This will get your agent to learn to go up quicker.

Upvotes: 2

Josh S.

Reputation: 128

16 states x 7 actions is a very small problem.

Rewards for other angles will help you learn faster, but can create odd behaviors later depending on your dynamics.

If you don't have momentum you may decrease the number of states, which will speed up learning and reduce memory useage (which is already tiny). To find the optimal number of states, try decreasing the number of states while analyzing a metric such as reward/timesteps over multiple games, or mean error (normalized by starting angle) over multiple games. Some state representations may perform much better than others. If not, choose the one which converges fastest. This should be relatively cheap with your small Q table.

If you want to learn quickly, you may also try Q-lambda or some other modified Reinforcement Learning algorithm to make use of temporal difference learning.

Edit: Depending on your dynamics this problem may not actually be suitable as a Markov Decision Process. For example, you may need to include the current rotation rate.

Upvotes: 4

Qlearning - Defining states and rewards

Answers (2)

Related Questions