different between effect of episodes and time in DQN and where is the updating the experience replay

Question

In DQN paper of DeepMind company, there are two loops one for episodes and one for running time in each step (one for training and one for different time-step of running). Am I right?

Since, nothing is done in outer loop except initialization and reset to conditions of first step, what are their differences?

For instance, in case 1, if we run for 1000 episodes and 400 time steps what are the differences we should expected in case 2, if we run for 4000 episodes and 100 time steps?

(is their difference that the second one has more chance to get rid of local minimum or something similar to that? or both are the same?)

Another question is where updating the experience replay is investigated?

enter image description here

different between effect of episodes and time in DQN and where is the updating the experience replay

Answers (1)

Related Questions