How to mix tensorflow keras model and transformers

Question

I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.

import tensorflow as tf

from transformers import TFRobertaForSequenceClassification

Method 1:

model = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)

Method 2:

input_ids = tf.keras.Input(shape=(128,), dtype='int32')

attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')

transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)

encoded = transformer([input_ids, attention_mask])

logits = encoded[0]

model = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)

Rest of the code for either method is identical,

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.

How to mix tensorflow keras model and transformers

Answers (1)

Related Questions