Reputation: 561
This is a code snippet using Keras library for creating models:
for state, action, reward, next_state, done in minibatch:
target = reward
if not done:
target = (reward + self.gamma *
np.amax(self.model.predict(next_state)[0]))
target_f = self.model.predict(state)
#print (target_f)
target_f[0][action] = target
self.model.fit(state, target_f, epochs=1, verbose=0)
I am trying to vectorize it. The only way I think to do is : 1. Create a numpy table with each row = (state, action, reward, next_state, done, target). So, there will be "mini-batch" number of rows. 2. Update target column based on other columns as (using masked arrays):
target[done==True] ==reward
target[done==False] == reward + self.gamma
*np.amax(self.model.predict(next_state)[0])
NB: state is 8-D, so state vector has 8 elements.
Despite hours of efforts, I am unable to code this properly. Is it possible to actually vectorize this piece of code?
Upvotes: 2
Views: 309
Reputation: 1655
You are very close! Assuming that minibatch
is an np.array
:
First find all the indices where done
is true. Assuming done
is index number 4.
minibatch_done=minibatch[np.where(minibatch[:,4]==True)]
minibatch_not_done=minibatch[np.where(minibatch[:,4]==False)]
Now we use this to update the minibatch
matrix conditionally. Assuming index 2 is reward
and index 3 is next_state
target = np.empty((minibatch.shape[0]))
n_done = minibatch_done.shape[0]
# First half (index 0...n_done)
target[:n_done] = minibatch_done[:,2]+self.gamma*np.amax(self.model.predict(minibatch_done[:,3]))
target[n_done:] = minibatch_not_done[:,2]
And there you have it :)
Edit: Fixed index error in target problems
Upvotes: 3