TensorFlow RNN-Model with fixed step output error

I started a very simple RNN project to solidify my knowledge in TF, basically a simple sequence generator using LSTMs and TF. The project is just a many-to-one sequence generation, the input is a 4 integer window and the output has only one float for each window. The minimum number of the input is 1 and the maximum is 61, so I can predict from 61 and forward. I just used a batch of all inputs, which has shape [58,4,1] and the output with shape [58,1]. For better visualization, the inputs and outputs have been written below.

        Inputs                     Labels
[[[ 1],[ 2],[ 3],[ 4]], -------> [[0.0493],
 [[ 2],[ 3],[ 4],[ 5]], ------->  [0.0634],
 [[ 3],[ 4],[ 5],[ 6]], ------->  [0.0773],
 [[ 4],[ 5],[ 6],[ 7]], ------->  [0.0909],
   ..   ..   ..   ..    ------->     ...  ,
 [[55],[56],[57],[58]], ------->  [0.5503],
 [[56],[57],[58],[59]], ------->  [0.5567],
 [[57],[58],[59],[60]], ------->  [0.5630],
 [[58],[59],[60],[61]]] ------->  [0.5693]]

The training part went very well and I could achieve something around 0.991 accuracy with 500 epochs, but when I try to predict some values from 61 to 118, the output has a fixed step down for all predicted values but somehow has the right behavior.

Because the purpose of this project is just for learning the basics, I decided to use the simplest functions in TF, so the seq2seq facilities have been left off. The code for the RNN is written below

def build_lstm(cell_lengh, cell_depth, batch_size, keep_prob):
    def lstm_row(cell_length, keep_prob):
        cell_row = tf.contrib.rnn.BasicLSTMCell(cell_lengh)
        cell_row = tf.contrib.rnn.DropoutWrapper(cell_row, keep_prob)
        return cell_row

    cell = tf.contrib.rnn.MultiRNNCell([lstm_row(cell_lengh, keep_prob) for _ in range(cell_depth)])
    initial_state = cell.zero_state(batch_size, tf.float32)

    return cell, initial_state

tf.reset_default_graph()

inputs = tf.placeholder(tf.float32, [None, feature_length, 1], name='inputs')
labels = tf.placeholder(tf.float32, [None, output_length], name='labels')
keep_prob = tf.placeholder(tf.float32, name='kpprob')

lstm_cell, initial_state = build_lstm(40, 2, batch_size=batch_size, keep_prob=keep_prob)
lstm_output, final_state = tf.nn.dynamic_rnn(lstm_cell, inputs, initial_state=initial_state)
lstm_outout_seq = lstm_output[:,-1,:]

dense_0 = tf.layers.dense(inputs=lstm_outout_seq, units=120, activation=tf.nn.relu)
dropout_0 = tf.layers.dropout(dense_0, rate=0.7)

with tf.variable_scope('sigmoid'):
    W = tf.Variable(tf.truncated_normal((120, 1), stddev=0.1), name='weights')
    b = tf.Variable(tf.zeros(1), name='bias')
logits = tf.matmul(dropout_0, W) + b

output = tf.nn.sigmoid(logits, name='output')

loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
correct_predictions = tf.abs(output - labels)
total_correct = tf.ones_like(correct_predictions)
accuracy = tf.reduce_mean(total_correct - correct_predictions)
learning_rate = tf.placeholder(tf.float32, name='learning_rate')
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

l_rate = 0.001
epochs = 500
kp_prob = 0.7

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for e in range(epochs):
        new_state = session.run([initial_state])
        feeder = {
            inputs: wnd_x,
            labels: wnd_y_scl,
            keep_prob: kp_prob,
            learning_rate: l_rate,
            initial_state: new_state
        }
        session_loss,
        session_accuracy,
        session_output, _,
        last_state = session.run([loss, accuracy, output,
                                  optimizer, final_state], feed_dict=feeder)
        print('Epoch {0}/{1}:\t'.format(e, epochs),
              'training loss {0}\t'.format(session_loss),
              'accuracy {0}\t'.format(session_accuracy))

    new_state = session.run([initial_state])
    feeder = {
        inputs: unseen_data_rsp,
        keep_prob: 1.0,
        initial_state: new_state
    }
    session_output = session.run([output], feed_dict=feeder)

As mentioned before, during the inference phase, the predictions have fixed step down but somehow have the right behavior, i.e. the derivate of the curves changes correctly for the time-steps.

During the training phase I have the following output:

Epoch 999/1000: training loss = 0.5913468599319458 | accuracy = 0.9909629225730896
         Input               Label          Output
[[ 1],[ 2],[ 3],[ 4]]  -->  [0.0493]  ...  [0.0591]
[[ 2],[ 3],[ 4],[ 5]]  -->  [0.0634]  ...  [0.0802]
[[ 3],[ 4],[ 5],[ 6]]  -->  [0.0773]  ...  [0.0777]
[[ 4],[ 5],[ 6],[ 7]]  -->  [0.0909]  ...  [0.1035]
  ..   ..   ..   ..    ...     ...            ...
[[55],[56],[57],[58]]  -->  [0.5503]  ...  [0.5609]
[[56],[57],[58],[59]]  -->  [0.5567]  ...  [0.5465]
[[57],[58],[59],[60]]  -->  [0.5630]  ...  [0.5543]
[[58],[59],[60],[61]]  -->  [0.5693]  ...  [0.5614]

And during inference phase I have the following output:

          Input                Prediction
[[ 58],[ 59],[ 60],[ 61]]  -->  [0.4408]
[[ 59],[ 60],[ 61],[ 62]]  -->  [0.4459]
[[ 60],[ 61],[ 62],[ 63]]  -->  [0.4510]
[[ 61],[ 62],[ 63],[ 64]]  -->  [0.4559]
  ...   ...   ...   ...    ...     ...
[[112],[113],[114],[115]]  -->  [0.6089]
[[113],[114],[115],[116]]  -->  [0.6101]
[[114],[115],[116],[117]]  -->  [0.6113]
[[115],[116],[117],[118]]  -->  [0.6124]

As you can see, the first input of the inference is the same of the last input of the training. What I don't understand here is why the same input gave me 2 different outputs and why theses output has a fixed step down, around 0.11. Thank you guys for any help and sorry for the long text, I can make it shorter upon request.

Upvotes: 1

Views: 126

Answers (1)

dwjbosman
dwjbosman

Reputation: 966

During inference you are resetting the state. And so you're getting two different values on the same input because the state of the network is different in both cases.

To keep the state after a prediction you would need to do something like this:

#iterate for each prediction {
  feeder = {
    inputs: unseen_data_rsp,
    keep_prob: 1.0,
    initial_state: last_state
  }
  session_output, last_state = session.run([output,final_state], feed_dict=feeder)
}

Also to get exactly the training result with the first input of inference you would need to first present all the training examples to ensure that you start inference with the correct state. Another approach would be to save the state of the network which you then could reuse during prediction.

Upvotes: 0

Related Questions