Reputation: 789
I am working with text sequences, with a sequence length of between 1-3. The labels are a "score". I have over 5 million samples. My network looks like this (Keras):
model.add(Embedding(word_count, 128, input_length=3))
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
I have tried many different network shapes and configurations, including with/without Dropout & BatchNorm. But my loss always looks like this:
I am using a batch size of 1024 and Adam optimiser.
As far as I can tell there are absolutely no differences between the training and testing datasets in regards to pre-processing etc.
Any suggestions on how I can diagnose this?
Upvotes: 3
Views: 4393
Reputation: 789
I found the problem. I was shuffling the test data between epochs, when I meant to shuffle the training data only. Thank you for your comments.
Upvotes: 2
Reputation: 13
First of all. you should split your dataset.
model.fit(X, Y, validation_split=0.1, epochs=100, batch_size=10)
and then you can see if the value change.
Upvotes: 1