Reputation: 131
I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.
import tensorflow as tf
from transformers import TFRobertaForSequenceClassification
Method 1:
model = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
Method 2:
input_ids = tf.keras.Input(shape=(128,), dtype='int32')
attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')
transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
encoded = transformer([input_ids, attention_mask])
logits = encoded[0]
model = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)
Rest of the code for either method is identical,
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])
I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.
Upvotes: 3
Views: 1945
Reputation: 131
Answering my own question here. I posted a bug report on HuggingFace GitHub and they fixed this in the new dev version (4.1.0.dev0 as of December 2020). The snippet below now works as expected:
input_ids = tf.keras.Input(shape=(128,), dtype='int32')
attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')
transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
encoded = transformer({"input_ids": input_ids, "attention_mask": attention_mask})
logits = encoded[0]
model = tf.keras.models.Model(inputs = {"input_ids": input_ids, "attention_mask": attention_mask}, outputs = logits)
Upvotes: 3