In DQN, why y_i is calculated but not stored?

Question

The DQN algorithm below

We have phi_t, a_t, r_t and phi_{t+1} fields in D's records. Why don't we have a 'y' field in D's records, so we can store 'y' values once calculated?

I mean, the minibatches are chosen randomly from D without any restrictions, so one record may be chosen multiple times, especially when the number of D's records are not large enough. If that happen, y needs to be recalculated multiple times. Am I thinking it correctly?

Pablo EM · Accepted Answer

Because y_i is computed using the the function Q, which changes from iteration to iteration. Therefore, the values stored in one iteration are not valid for the next iterations.

Within the same iteration, I thikn you are rigth pointing out that if you sample the same transition several times, then it's not necessary to compute y_i several times, instead you can use the same result. I guess the pseudo code is more focused in the key concepts than in this kind of implementation details.

In DQN, why y_i is calculated but not stored?

Answers (1)

Related Questions