landings
landings

Reputation: 770

In DQN, why y_i is calculated but not stored?

The DQN algorithm below

enter image description here

Source

We have phi_t, a_t, r_t and phi_{t+1} fields in D's records. Why don't we have a 'y' field in D's records, so we can store 'y' values once calculated?

I mean, the minibatches are chosen randomly from D without any restrictions, so one record may be chosen multiple times, especially when the number of D's records are not large enough. If that happen, y needs to be recalculated multiple times. Am I thinking it correctly?

Upvotes: 2

Views: 103

Answers (1)

Pablo EM
Pablo EM

Reputation: 6669

Because y_i is computed using the the function Q, which changes from iteration to iteration. Therefore, the values stored in one iteration are not valid for the next iterations.

Within the same iteration, I thikn you are rigth pointing out that if you sample the same transition several times, then it's not necessary to compute y_i several times, instead you can use the same result. I guess the pseudo code is more focused in the key concepts than in this kind of implementation details.

Upvotes: 2

Related Questions