mining
mining

Reputation: 3699

How to interpret the strange training curve for RNN?

I use the tensorflow to train a simple two-layer RNN on my data set. The training curve is shown as follows:

RNN curve

where, the x-axis is the steps(in one step, a batch_size number of samples is used to update the net parameters), the y-axis is the accuracy. The red, green, blue line is the accuracy in training set, validation set, and the test set, respectively. It seems the training curve is not smooth and have some corrupt change. Is it reasonable?

Upvotes: 1

Views: 653

Answers (3)

Jiang Xiang
Jiang Xiang

Reputation: 3256

Have you tried gradient clipping, Adam optimizer and learning rate decay? From my experience, gradient clipping can prevent exploding gradients, Adam optimizer can converge faster, and learning rate decay can improve generalization.

Have you shuffled the training data?

In addition, visualizing the distribution of weights also helps debugging the model.

Upvotes: 2

MMN
MMN

Reputation: 676

The fact that your test and validation accuracy drops horribly at times 13 and 21 is suspicious. E.g. 13 drops the test score below epoch 1.

This implies your learning rate is probably too large: a single mini-batch shouldn't cause that amount of weight change.

Upvotes: 1

Dmytro Danevskyi
Dmytro Danevskyi

Reputation: 3159

It's absolutely OK since you are using SGD. General trend is that your accuracy increases as number of used minibatches increases, however, some minibatches could significantly 'differ' from most of the others, therefore accuracy could be poor on them.

Upvotes: 1

Related Questions