What is importance of reward policy in Reinforcement learninig?

Question

We assign +1 reward for reaching goal and -1 for reaching an unwanted state.

Is it necessary to give something like +0.01 reward for taking an action which reaches near to the goal and -0.01 reward for taking an action which does not ?

What will the significant changes with the reward policy mentioned above ?

Pablo EM · Accepted Answer

From Sutton and Barto's book, Section 3.2 Goals and Rewards:

It is thus critical that the rewards we set up truly indicate what we want accomplished. In particular, the reward signal is not the place to impart to the agent prior knowledge about how to achieve what we want it to do.3.4For example, a chess- playing agent should be rewarded only for actually winning, not for achieving subgoals such taking its opponent's pieces or gaining control of the center of the board. If achieving these sorts of subgoals were rewarded, then the agent might find a way to achieve them without achieving the real goal. For example, it might find a way to take the opponent's pieces even at the cost of losing the game. The reward signal is your way of communicating to the robot what you want it to achieve, not how you want it achieved.

So, in general it's a good idea to avoid introducing prior knowledge through the reward function because it can yield to undesired results.

However, it is known that RL performance can be improved by guiding agent learning process through the reward function. In fact, in some complex task it's necessary to first guide the agent to a secondary (easier) goal, and then change the reward to learn the primary goal. This technique is know as reward shaping. An old but interesting example can be found in the Randløv and Alstrøm's paper: Learning to Drive a Bicycle using Reinforcement Learning and Shaping.

What is importance of reward policy in Reinforcement learninig?

Answers (1)

Related Questions