Reputation: 15
I am right now trying to implement my LSTM network for my robot and a navigation task. I already applied DDQN with a normal feed forward NN and want to compare that one with a LSTM network. I also made it to implement the LSTM in the most easy form with a sample and time_step size of 1 and a feature size of 364 => (1,1,364). According to my research that would just be a standard FFNN due to the single time_step and sample.
That's why I want to increase the time_step size to i.e. 10 steps. But now another problem occured. Within the DDQN, I applied batch learning with the batch_size = 64. However, if I now choose a time_step size of 10, that would not work anymore because the LSTM network wants to have 9 additional timesteps now on top of the sample.
That is where my question arises. If i have a random batch and a memory that includes information for one point of time (state, next_state, action, reward, done all at a certain time t), do I need to collect a random sample and then choose the 9 data points before that sample so that I got a batch size of (64, 10, 364)?
Kind regards
Latest code:
def buildModel(self):
model = Sequential()
model.add(LSTM(64, input_shape=(1,364), return_sequences=True))
model.add(LSTM(64))
model.add(Dense(self.action_size, kernel_initializer='lecun_uniform'))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer=RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-06))
model.summary()
Code for training:
def trainModel(self, target=False):
mini_batch = random.sample(self.memory, self.batch_size)
X_batch = np.empty((0, self.state_size), dtype=np.float64)
Y_batch = np.empty((0, self.action_size), dtype=np.float64)
Z_batch = np.empty((0, 1), dtype=np.float64)
for i in range(self.batch_size):
states = mini_batch[i][0]
actions = mini_batch[i][1]
rewards = mini_batch[i][2]
next_states = mini_batch[i][3]
dones = mini_batch[i][4]
q_value = self.model.predict(states.reshape((1,1, len(states))))
if target:
next_target = self.target_model.predict(next_states.reshape((1,1, len(next_states))))
else:
next_target = self.model.predict(next_states.reshape((1,1, len(next_states))))
next_q_value = self.getQvalue(rewards, next_target, dones)
X_batch = np.append(X_batch, np.array([states.copy()]), axis=0)
Y_sample = q_value.copy()
Y_sample[0][actions] = next_q_value
Y_batch = np.append(Y_batch, np.array([Y_sample[0]]), axis=0)
if dones:
X_batch = np.append(X_batch, np.array([next_states.copy()]), axis=0)
Y_batch = np.append(Y_batch, np.array([[rewards] * self.action_size]), axis=0)
X_batch = X_batch.reshape((len(Y_batch), 1, len(next_states)))
self.model.fit(X_batch, Y_batch, batch_size=self.batch_size, epochs=1, verbose=0)
Upvotes: 0
Views: 205
Reputation: 809
If you are using an LSTM, the idea is that there is something that can be learned from the sequence, as opposed to a single time point, which makes the final prediction "better." So, I believe you'll want to construct the input data such that each "sample" consists of 10 time steps worth of your feature data. Then the next sample can be created by adding the next time step and removing the earliest time step (ie: a rolling window). There are probably other ways to accomplish the same general idea, but that is a suggestion. I hope I understood your use case, but I believe this is a direction to consider. I hope this helps.
Upvotes: 0