Sa Ra
Sa Ra

Reputation: 49

different between effect of episodes and time in DQN and where is the updating the experience replay

In DQN paper of DeepMind company, there are two loops one for episodes and one for running time in each step (one for training and one for different time-step of running). Am I right?

Since, nothing is done in outer loop except initialization and reset to conditions of first step, what are their differences?

For instance, in case 1, if we run for 1000 episodes and 400 time steps what are the differences we should expected in case 2, if we run for 4000 episodes and 100 time steps?

(is their difference that the second one has more chance to get rid of local minimum or something similar to that? or both are the same?)

Another question is where updating the experience replay is investigated?

enter image description here

Upvotes: 2

Views: 2179

Answers (1)

Kevin Fang
Kevin Fang

Reputation: 2012

For your first question: the answer is yes, there are two loops, and they do have differences.

You have to think of the true meaning of an episode. In most cases, we can consider each episode a 'game'. A 'game' needs to have an end. And we need to do our best to let every game end within the length of an episode (imagine what you can learn if you cannot get out of a labyrinth game). The Q values of DQN is an approximation of 'current reward' + 'discounted future rewards', while you need to know when will the future ends to make a better approximation.

So assume we usually take 200 steps to finish the game, then an episode of 100 time steps has a huge difference from an episode of 400 time steps.

For experience replay update, it happens in every time step. I don't get what you're asking. If you can explain your question in detail I think I could answer it.

Upvotes: 1

Related Questions