AgnosticCucumber
AgnosticCucumber

Reputation: 666

Epsilon-greedy algorithm

I understand epsilon-greedy algorithm, but there is one point of confusion.

  1. Is it average reward or value that it keeps track of? Most of the time, it is explained in the context of multi-armed bandit. However, there is no distinction of reward / value in the problem of multi-armed bandit.
  2. is epsilon-greedy algorithm a subset of Q-learning? The vague definition of Q-learning seems to be: approximating the optimal Q-function by utilizing past experiences.

Upvotes: 1

Views: 1856

Answers (1)

Simon
Simon

Reputation: 5402

Epsilon-greedy is a policy, not an algorithm. It is exclusive of discrete action problems: you select the action according to

argmax Q(s,a) with probability 1-epsilon
random otherwise

You can use with Q-learning, SARSA, DDPG, policy gradient, ...

Upvotes: 3

Related Questions