Deepdelusion
Deepdelusion

Reputation: 131

How to mix tensorflow keras model and transformers

I am trying to import a pretrained model from Huggingface's transformers library and extend it with a few layers for classification using tensorflow keras. When I directly use transformers model (Method 1), the model trains well and reaches a validation accuracy of 0.93 after 1 epoch. However, when trying to use the model as a layer within a tf.keras model (Method 2), the model can't get above 0.32 accuracy. As far as I can tell based on the documentation, the two approaches should be equivalent. My goal is to get Method 2 working so that I can add more layers to it instead of directly using the logits produced by Huggingface's classifier head but I'm stuck at this stage.

import tensorflow as tf

from transformers import TFRobertaForSequenceClassification

Method 1:

model = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)

Method 2:

input_ids = tf.keras.Input(shape=(128,), dtype='int32')

attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')

transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)

encoded = transformer([input_ids, attention_mask])

logits = encoded[0]

model = tf.keras.models.Model(inputs = [input_ids, attention_mask], outputs = logits)

Rest of the code for either method is identical,

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

I am using Tensorflow 2.3.0 and have tried with transformers versions 3.5.0 and 4.0.0.

Upvotes: 3

Views: 1945

Answers (1)

Deepdelusion
Deepdelusion

Reputation: 131

Answering my own question here. I posted a bug report on HuggingFace GitHub and they fixed this in the new dev version (4.1.0.dev0 as of December 2020). The snippet below now works as expected:

input_ids = tf.keras.Input(shape=(128,), dtype='int32')
attention_mask = tf.keras.Input(shape=(128, ), dtype='int32')

transformer = TFRobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=6)
encoded = transformer({"input_ids": input_ids, "attention_mask": attention_mask})
logits = encoded[0]
 
model = tf.keras.models.Model(inputs = {"input_ids": input_ids, "attention_mask": attention_mask}, outputs = logits)

Upvotes: 3

Related Questions