Reputation: 886
I am trying to fine tune a BERT pre-trained model. I am working with the yelp_polarity_reviews
data from tensorflow_datasets
. I have made sure:
KerasLayer
with
tensorflow_hub
.tokenizer
, vocab_file
and do_lower_case
which
were used in training the original model.tf.data.Dataset
object and apply map
function with wrapping my python function in tf.py_function
.input_word_ids
,
input_mask
and input_type_ids
in an array.After making sure all the above is implemented correctly, while training the model overfits badly. The training accuracy goes up to ~99% while the validation accuracy barely crosses 50% mark.
I have tried different optimizers
, error functions
, learning rates
, even tried with high as well as low dropouts
and I've also tried with altering the size of train data but after all this the result is no better.
Here is the colab notebook that shows the executed code.
Any suggestions and help would be highly appreciated.
Upvotes: 0
Views: 1454
Reputation: 17219
I checked your colab code and with a few trails, it appeared that there was an issue on the validation set. And it was right of course. The mistake was to load the train labels in the test data set.
elp_test, _ = train_test_split(list(zip(yelp['test']['text'].numpy(),
yelp['test']['label'].numpy())), # < correction
train_size=0.025,
random_state=36)
Now, if you run the model, you will get
history = model.fit(data_train,
validation_data=data_valid,
epochs=1,
batch_size=256,
verbose=2)
915ms/step - loss: 0.3309 - binary_accuracy: 0.8473 -
val_loss: 0.1722 - val_binary_accuracy: 0.9354
Upvotes: 1