Reputation: 3699
I use the tensorflow
to train a simple two-layer RNN on my data set. The training curve is shown as follows:
where, the x-axis
is the steps(in one step, a batch_size
number of samples is used to update the net parameters), the y-axis
is the accuracy. The red, green, blue line is the accuracy in training set, validation set, and the test set, respectively. It seems the training curve is not smooth and have some corrupt change. Is it reasonable?
Upvotes: 1
Views: 653
Reputation: 3256
Have you tried gradient clipping, Adam optimizer and learning rate decay? From my experience, gradient clipping can prevent exploding gradients, Adam optimizer can converge faster, and learning rate decay can improve generalization.
Have you shuffled the training data?
In addition, visualizing the distribution of weights also helps debugging the model.
Upvotes: 2
Reputation: 676
The fact that your test and validation accuracy drops horribly at times 13 and 21 is suspicious. E.g. 13 drops the test score below epoch 1.
This implies your learning rate is probably too large: a single mini-batch shouldn't cause that amount of weight change.
Upvotes: 1
Reputation: 3159
It's absolutely OK since you are using SGD. General trend is that your accuracy increases as number of used minibatches increases, however, some minibatches could significantly 'differ' from most of the others, therefore accuracy could be poor on them.
Upvotes: 1