Reputation: 17292
My SARSA with gradient-descent keep escalating the weights exponentially. At Episode 4 step 17 the value is already nan
Exception: Qa is nan
e.g:
6) Qa:
Qa = -2.00890180632e+303
7) NEXT Qa:
Next Qa with west = -2.28577776413e+303
8) THETA:
1.78032402991e+303 <= -0.1 + (0.1 * -2.28577776413e+303) - -2.00890180632e+303
9) WEIGHTS (sample)
5.18266630725e+302 <= -1.58305782482e+301 + (0.3 * 1.78032402991e+303 * 1)
I don't know where to look for the mistake I made. Here's some code FWIW:
def getTheta(self, reward, Qa, QaNext):
""" let t = r + yQw(s',a') - Qw(s,a) """
theta = reward + (self.gamma * QaNext) - Qa
def updateWeights(self, Fsa, theta):
""" wi <- wi + alpha * theta * Fi(s,a) """
for i, w in enumerate(self.weights):
self.weights[i] += (self.alpha * theta * Fsa[i])
I have about 183 binary features.
Upvotes: 3
Views: 1477
Reputation: 5
I do not have access to the full code in your application, so I might be wrong. But I think that I know where you are going wrong. First and foremost, normalization should not be necessary here. For weights to get bloated so soon in this situation suggests something wrong with your implementation.
I think your update equation should be:-
self.weights[:, action_i] = self.weights[:, action_i] + (self.alpha * theta * Fsa[i])
That is to say that you should be updating columns instead of rows, because rows are for states and columns for for actions in the weight matrix.
Upvotes: 0
Reputation: 6434
you need normalization in each trial. This will keep the weights in a bounded range. (e.g. [0,1]). They way you are adding the weights each time, just grows the weights and it would be useless after the first trial.
I would do something like this:
self.weights[i] += (self.alpha * theta * Fsa[i])
normalize(self.weights[i],wmin,wmax)
or see the following example (from literature of RL):
You need to write the normalization function by yourself though ;)
Upvotes: 2