Mark
Mark

Reputation: 333

How machine know which step can get max reward?

In my understanding, reinforcement learning will get a reward from the action.

However, when playing a video game, there is no reward ( reward == 0 ) in most of the steps (ex: street fighter), eventually, we got a reward ( ex: player win, reward = 1 ), there are so many actions, how machine know which one is the key point to win this game ?

Upvotes: 2

Views: 144

Answers (1)

agold
agold

Reputation: 6276

In Reinforcement Learning the reward can be immediate or delayed [1]:

  • The immediate reward could be:
    • very high positive if the agent wins the game (it is the last action that defeats the opponent);
    • very low negative if the agent loses the game;
    • positive if the action damages your opponent;
    • negative if the agent loses health points.
  • The delayed reward is caused by a future reward that is possible through a current action. For example, moving one step to the left could cause that in the next step it avoids being hit and it can hit the opponent.

Reinforcement learning algorithms, such as Q-learning, choose the action that gives the highest expected reward. This reward is continuously updated with the current reward (r at time t) and with possible future rewards (the last value in the equation, max Q, based on actions from time t+1 and later): qlearning

More detailed information about (Deep) Reinforcement Learning, with some examples of applications to games, is given in A Beginner's Guide to Deep Reinforcement Learning.

Upvotes: 2

Related Questions