Reputation: 169

TensorFlow LSTM: Why does test accuracy become low, but not training one?

I have tried to build LSTM model with TensorFlow. The training of the LSTM seem to work fine, getting more than 90% accuracy. A problem plagued me is “test accuracy” that is very low. So, I thought this was due to over-fitting? But the attempts such as increasing the training batch or reducing the element_size (from 10 to 5) were waste of my efforts, nor did the applying "dropout" solve it. I want some directions on how to improve my code to acquire the high test accuracy. The followings are summary of my data/parameters

Input variable is economic time series data standardized
Output variable is categorical features (labels) converted by one-hot encoding

Sequence_length : 20
Element_size: 5
Hidden_layer : 80
Categories (labels): 30 
Training batch : 924
Test batch : 164
Learn rate is 0.0005 (Is it low?)

Here is the code I build

#Split x_buch and y_batch
train_x,test_x=np.split(x_batch,[int(batch_size*0.85)])
train_y,test_y=np.split(y_batch,[int(batch_size*0.85)])
print('train_x shape: {0} and test_x shape: {1}'.format(train_x.shape,test_x.shape))
print('train_y shape: {0} and test_y shape: {1}'.format(train_y.shape,test_y.shape))

#Create placehold for inpt, labels
inputs=tf.placeholder(tf.float32,shape=[None,step_time,element_size],name='inputs')
y=tf.placeholder(tf.float32,shape=[None,label_num],name='y')

#Tensorflow  built-in functinon
with tf.variable_scope('lstm'):
    lstm_cell=tf.contrib.rnn.LSTMCell(hidden_layer,forget_bias=1.0)
    cell_drop=tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=0.7)
    outputs,states=tf.nn.dynamic_rnn(cell_drop,inputs,dtype=tf.float32) 
    print('outputs shape: {0}'.format(outputs.shape))

W1={'linear_layer':tf.Variable(tf.truncated_normal([hidden_layer,label_num],mean=0,stddev=.01))}
b1={'linear_layer':tf.Variable(tf.truncated_normal([label_num],mean=0,stddev=.01))}

#Extract the last relevant output and use in a linear layer
final_output=tf.matmul(outputs[:,-1,:],W1['linear_layer'])+b1['linear_layer']

with tf.name_scope('cross_entropy'):
    softmax=tf.nn.softmax_cross_entropy_with_logits(logits=final_output,labels=y)
    cross_entropy=tf.reduce_mean(softmax)

with tf.name_scope('train'):
    train_step=tf.train.AdamOptimizer(learn_rate,0.9).minimize(cross_entropy)

with tf.name_scope('accracy'):
    correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(final_output,1))
    accuracy=(tf.reduce_mean(tf.cast(correct_prediction,tf.float32)))*100

#Training
with tf.Session()as sess:
    sess.run(tf.global_variables_initializer())    
    for step in range(5000):
        sess.run(train_step,feed_dict={inputs:train_x,y:train_y})
        if step % 500 == 0:
            acc=sess.run(accuracy,feed_dict={inputs:train_x,y:train_y})
            loss=sess.run(cross_entropy,feed_dict={inputs:train_x,y:train_y})
            print('Inter'+str(step)+',Minibatch loss= '+'{:.6f}'.format(loss)+', Traning Accracy='+'{:.5f}'.format(acc))

# Test
    test_acc=sess.run(accuracy,feed_dict={inputs:test_x,y:test_y})
    print("Test Accuracy is {0}".format(test_acc))

and its result is

Input Shape: (21760, 5)
Output Shape: (21760, 30)
x_batch shape: (1088, 20, 5)
y_batch shape: (1088, 30)
train_x shape: (924, 20, 5) and test_x shape: (164, 20, 5)
train_y shape: (924, 30) and test_y shape: (164, 30)
outputs shape: (?, 20, 80)
Inter0,Minibatch loss= 3.398923, Traning Accracy=5.30303
Inter500,Minibatch loss= 2.027734, Traning Accracy=38.09524
Inter1000,Minibatch loss= 1.340760, Traning Accracy=61.79654
Inter1500,Minibatch loss= 1.010518, Traning Accracy=72.83550
Inter2000,Minibatch loss= 0.743997, Traning Accracy=79.76190
Inter2500,Minibatch loss= 0.687736, Traning Accracy=79.76190
Inter3000,Minibatch loss= 0.475408, Traning Accracy=85.17316
Inter3500,Minibatch loss= 0.430477, Traning Accracy=87.22944
Inter4000,Minibatch loss= 0.359262, Traning Accracy=89.17749
Inter4500,Minibatch loss= 0.274463, Traning Accracy=90.69264
Test Accuracy is 4.878048419952393

I’ve never used TensorFlow and LSTM model, so,this is the first time, hence I know I am doing something wrong but cannot put a finger on it

So, Can someone provide help?

Upvotes: 1

Answers (3)

Genzo Ito

Reputation: 169

I seem to lead to an answer in light of informative dennlinger’s answer. Fast of all, I divided the training data into six sets(x_1, x_2...x_6 and y_1, y_2, ...y_6) and each one is around same size of test data. I’m not sure to use it as the third validation set you mentioned, but try to apply it. What’s more, I checked which classes the each set doesn’t contain,for example, y_1 doesn’t contain the class No.11,16,21,22, and 25

train_y
[]
y_1
[11, 16, 21, 22, 25]
y_2
[11, 14, 16, 23]
y_3
[11, 19, 21, 23]
y_4
[14, 21, 23]
y_5
[16, 21, 22, 23]
y_6
[11, 21, 22, 23]
test_y
[11, 21, 22, 23]

First examination (validation) is to train on x_1/y_1 sets and compute the accuracy of the test data.Although I stop training at the each step, the performance was not improved, it was almost same result.

Stop at step 1000
Inter500,Minibatch loss= 1.976426, Traning Accracy=46.01227
Test Accuracy is 7.317072868347168
Stop at step1500
Inter1000,Minibatch loss= 1.098709, Traning Accracy=66.25767
Test Accuracy is 4.2682929039001465
Stop at step 2000
Inter1500,Minibatch loss= 0.906059, Traning Accracy=74.23312
Test Accuracy is 6.097560882568359
Stop at step 2500
Inter2000,Minibatch loss= 0.946361, Traning Accracy=76.07362
Test Accuracy is 6.707317352294922

Next, I tried to examine the performance on a few combinations, and the results are below

Train on x_6/y_6 sets and test on test data
Inter2500,Minibatch loss= 0.752621, Traning Accracy=79.77941
Test Accuracy is 78.65853881835938

Train on x_6/y_6 sets and test on x_5/y_5 sets
Inter2500,Minibatch loss= 0.772954, Traning Accracy=78.67647
Test Accuracy is 3.658536434173584

Train on training data and test on x_4/y_4 sets
Inter3000,Minibatch loss= 1.980538, Traning Accracy=41.01731
Test Accuracy is 37.42331314086914

Interestingly, a combination that was trained on x_6/y_6 sets and tested on test data could perform better than previous one, which the accuracy of test increased to about 78 percent. This is, I assume, due to identical class, it mean y_6 contain all classes of test data (see above), as well as same size. So, this show I have to make consideration of which data sets are suitable and try to validate LSTM model under various conditions, which is so important.

On the other hand, CHG, decreasing the neurons (80 to 10 or 5) and batches, didn’t improve the performance at all.

Upvotes: 1

dennlinger

Reputation: 11508

Before I go into more details:
I am assuming that you are referring to batch_size when talking about element_size? If I am wrong in that assumption, please correct me here.

As the other answer mentioned, one potential reason could be overfitting, i.e. you are trying "too hard with your training data". One general way to resolve this would be to keep track of the performance on unseen training data with held-back validation samples. I.e., instead of splitting two-ways (train/test), you have a third validation set (usually around the same size of the testing data), and check every now and then during training how your model performs on this validation data.

A common observation is the following curve: As you can see, the model improves constantly on the training data, but it does so, since it sacrifices the ability to generalize to unseen data.

Generally, you try to stop training at the point where the error on the validation set would be minimal - even if that does not guarantee optimal results on your training data. We expect it to then perform best on the (completely unknown) previous test set.

As a quick side note, if you are doing this in TensorFlow (which I am not 100% familiar with): Generally, you have to "switch" your model from training to evaluation to get the actual results on your validation set (and not accidentally train on them as well), but you can find plenty of actual implementations of this online.

Furthermore, overfitting might be an issue if you have too many neurons! In your case, you have only 800 examples, but already 80 neurons, which is IMO a ratio that is way too high. You could try using less neurons, and see if that improves the accuracy on your test set, even if that might reduce the accuracy on training data, too.
In the end, you want to have a compact descriptor of your problem, and not a network that "learns" to recognize every single of your training instances.

Furthermore, if you actually do work with mini batches, you could try and reduce the number even further. I really like this one tweet from Yann LeCun, so I will just post this here, too ;-)
Joke aside, training with smaller batches can lead to better generalization as well, as absurd as it sounds. Large batches are generally only really helpful if you have a massive training set, or are training on a GPU (since then the copy to/from the GPU to memory is very costly, and mini batches reduce the number of such operations), or if you need a long time to reach convergence.

Since you are using a LSTM architecture (which, due to its sequentiality, has a similar performance on CPU and GPUs, since there is not much to be parallelized), a large batch size will likely not increase your (computational) performance, but having smaller batches might improve on the accuracy performance.

Lastly, and this is why I commented on the other answer initially, we might be completely off in this explanation here, and it could be a totally different reason after all.

What many people tend to forget is to do some initial exploratory analysis on your test/train split. If you have only representatives of one class in your test set, but barely any in your training data, the results will likely not be good. Similarly, if you only train on 29 out of your 30 classes, it will be hard for the network to recognize any sample of the 30th class.

To avoid this, make sure you have a somewhat even split (i.e. sample a certain number of classes for each class in both test and training sets), and check if the classes are somewhat evenly distributed.

Doing so might save you surprisingly much pain later, and generally helps to improve performance on completely new training data as well. Always remember - Deep Learning doesn't magically solve all the problems you have in predictive analysis, it just gives you a very powerful tool to tackle a specific sub-problem.

Upvotes: 2

user3426943

Reputation:

If the training accuracy continues to go up but the test accuracy goes down, then you are overfitting. Try running less epochs or use a lower learning rate.

Upvotes: 0

TensorFlow LSTM: Why does test accuracy become low, but not training one?

Answers (3)

Related Questions