Reputation: 53
This is the LSTM code from UDACITY for sentiment classification.
Here is the link of whole sentence-rnn code: udacity/sentiment-rnn
I wonder why they initialize the cell state right under the for loop for epoch.
I think the cell state must be zero-initialize when the input sentence changes, so it must be under the mini-batch for loop statement.
## part of the sentence-rnn code
# Getting an initial state of all zeros
initial_state = cell.zero_state(batch_size, tf.float32)
with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(epochs):
state = sess.run(initial_state) ###### i think this line
for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
###### should be here
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
anyone who can explain why?
Thanks!
Upvotes: 3
Views: 776
Reputation: 10366
- Zero State Initialization is good practice if the impact is low
The default approach to initializing the state of an RNN is to use a zero state. This often works well, particularly for sequence-to-sequence tasks like language modeling where the proportion of outputs that are significantly impacted by the initial state is small.
- Zero State Initialization in each batch can lead to overfitting
Zero Initialization for each batch will lead to the following: Losses at the early steps of a sequence-to-sequence model (i.e., those immediately after a state reset) will be larger than those at later steps, because there is less history. Thus, their contribution to the gradient during learning will be relatively higher. But if all state resets are associated with a zero-state, the model can (and will) learn how to compensate for precisely this. As the ratio of state resets to total observations increases, the model parameters will become increasingly tuned to this zero state, which may affect performance on later time steps.
- Do we have other options?
One simple solution is to make the initial state noisy (to decrease the loss for the first time step). Look here for details and other ideas
Upvotes: 2