Reputation: 770
The DQN algorithm below
We have phi_t, a_t, r_t and phi_{t+1} fields in D's records. Why don't we have a 'y' field in D's records, so we can store 'y' values once calculated?
I mean, the minibatches are chosen randomly from D without any restrictions, so one record may be chosen multiple times, especially when the number of D's records are not large enough. If that happen, y needs to be recalculated multiple times. Am I thinking it correctly?
Upvotes: 2
Views: 103
Reputation: 6669
Because y_i
is computed using the the function Q, which changes from iteration to iteration. Therefore, the values stored in one iteration are not valid for the next iterations.
Within the same iteration, I thikn you are rigth pointing out that if you sample the same transition several times, then it's not necessary to compute y_i
several times, instead you can use the same result. I guess the pseudo code is more focused in the key concepts than in this kind of implementation details.
Upvotes: 2