Gerry
Gerry

Reputation: 2140

Failed to train toy LSTM in tensorflow

I'm trying to get acquainted with recurrent networks in tensorflow using a toy problem for sequence classification.

Data:

half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 6] # Negative sequence.
num_input = len(pos_ex)
data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

Model:

_, x_width = data.shape
X = tf.placeholder("float", [None, x_width])
Y = tf.placeholder("float", [None, num_classes])

weights = tf.Variable(tf.random_normal([num_input, n_hidden]))
bias = tf.Variable(tf.random_normal([n_hidden]))


def lstm_model():
    from tensorflow.contrib import rnn
    x = tf.split(X, num_input, 1)
    rnn_cell = rnn.BasicLSTMCell(n_hidden)
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)
    return tf.matmul(outputs[-1], weights) + bias

Training:

logits = lstm_model()
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Train...

My training accuracy varies around 0.5, which confuses me because the problem is very simple.

Step 1, Minibatch Loss = 82.2726, Training Accuracy = 0.453
Step 25, Minibatch Loss = 6.7920, Training Accuracy = 0.547
Step 50, Minibatch Loss = 0.8528, Training Accuracy = 0.500
Step 75, Minibatch Loss = 0.6989, Training Accuracy = 0.500
Step 100, Minibatch Loss = 0.6929, Training Accuracy = 0.516

Changing the toy data to:

pos_ex = [1, 2, 3, 4, 5]
neg_ex = [1, 2, 3, 4, 100]

Yields an instant convergence to accuracy 1. Could anyone please explain me why this network fails on such a simple task? Thank you.

The code above is based on this tutorial.

Upvotes: 1

Views: 139

Answers (1)

mr_mo
mr_mo

Reputation: 1528

Have you tried reducing the learning rate?
In the second example the separation on the last coordinate is larger in values, which should make no difference, but has effect on the choice of learning rate.
If you would normalize the data (set the domain of each coordinate between -1 and 1), and find the appropriate step size, you should solve both problems in the same number of steps.

EDIT: Played with your toy examples a little, the following is working even without normalization

import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn

# Meta parameters

n_hidden = 10
num_classes = 2
learning_rate = 1e-2
input_dim = 5
num_input = 5



# inputs
X = tf.placeholder("float", [None, input_dim])
Y = tf.placeholder("float", [None, num_classes])

# Model
def lstm_model():
    # input layer
    x = tf.split(X, num_input, 1)

    # LSTM layer
    rnn_cell = rnn.BasicLSTMCell(n_hidden)
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # final layer - softmax
    weights = tf.Variable(tf.random_normal([n_hidden, num_classes]))
    bias = tf.Variable(tf.random_normal([num_classes]))
    return tf.matmul(outputs[-1], weights) + bias

# logits and prediction
logits = lstm_model()
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# -----------
# Train func
# -----------
def train(data,labels):

    with tf.Session() as session:
        session.run(tf.global_variables_initializer())
        for i in range(1000):
            _, loss, onehot_pred = session.run([train_op, loss_op, prediction], feed_dict={X: data, Y: labels})
            acc = np.mean(np.argmax(onehot_pred,axis=1) == np.argmax(labels,axis=1))
            print('Iteration {} accuracy: {}'.format(i,acc))
            if acc == 1:
                print('---> Finised after {} iterations'.format(i+1))
                break

# -----------
# Train 1
# -----------
# data generation
half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 6] # Negative sequence.



data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

train(data,labels)

# -----------
# Train 2
# -----------
# data generation
half_len = 500
pos_ex = [1, 2, 3, 4, 5] # Positive sequence.
neg_ex = [1, 2, 3, 4, 100] # Negative sequence.


data = np.concatenate((np.stack([pos_ex]*half_len), np.stack([neg_ex]*half_len)), axis=0)
labels = np.asarray([0, 1] * half_len + [1, 0] * half_len).reshape((2 * half_len, -1))

train(data,labels)

Output is:

Iteration 0 accuracy: 0.5
Iteration 1 accuracy: 0.5
Iteration 2 accuracy: 0.5
Iteration 3 accuracy: 0.5
Iteration 4 accuracy: 0.5
Iteration 5 accuracy: 0.5
Iteration 6 accuracy: 0.5
Iteration 7 accuracy: 0.5
Iteration 8 accuracy: 0.5
Iteration 9 accuracy: 0.5
Iteration 10 accuracy: 1.0
---> Finised after 11 iterations

Iteration 0 accuracy: 0.5
Iteration 1 accuracy: 0.5
Iteration 2 accuracy: 0.5
Iteration 3 accuracy: 0.5
Iteration 4 accuracy: 0.5
Iteration 5 accuracy: 0.5
Iteration 6 accuracy: 0.5
Iteration 7 accuracy: 0.5
Iteration 8 accuracy: 0.5
Iteration 9 accuracy: 1.0
---> Finised after 10 iterations

Good luck!

Upvotes: 1

Related Questions