
Reputation: 302

Cannot add CRF layer on top of BERT in keras for NER

I am facing an unknown issue while training my BERT-CRF model for NER. I am using keras.contrib for the CRF model.

Here are the imported libraries.

!pip install transformers
!pip install git+
import pandas as pd
import numpy as np
from transformers import TFBertModel, BertTokenizer, BertConfig
import tensorflow as tf
from tensorflow import keras
from keras_contrib.layers import CRF
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tqdm import tqdm

Code for the model creation.

input_ids = keras.layers.Input(shape=(MAX_LEN,), dtype=tf.int32)
token_type_ids = keras.layers.Input(shape=(MAX_LEN,), dtype=tf.int32)
attention_mask = keras.layers.Input(shape=(MAX_LEN,), dtype=tf.int32)
bert_output = bert(
bert_output = keras.layers.Dropout(0.3)(bert_output)
dense_layer_output = keras.layers.Dense(num_classes+1, activation='softmax', name='output')(bert_output)
crf = CRF(num_classes)
outputs = crf(dense_layer_output)
model = keras.Model(
       inputs=[input_ids, token_type_ids, attention_mask],
    validation_data=(x_test, y_test)

While trying to train the model I am getting this error. I cannot understand from where it is originating and why.

WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
AttributeError                            Traceback (most recent call last)
<ipython-input-18-f369b38eb91d> in <module>()
      5     verbose=1,
      6     batch_size=32,
----> 7     validation_data=(x_test, y_test)
      8 )

9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ in wrapper(*args, **kwargs)
    975           except Exception as e:  # pylint:disable=broad-except
    976             if hasattr(e, "ag_error_metadata"):
--> 977               raise e.ag_error_metadata.to_exception(e)
    978             else:
    979               raise

AttributeError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/ train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/keras_contrib/losses/ crf_loss  *
        crf, idx = y_pred._keras_history[:2]

    AttributeError: 'Tensor' object has no attribute '_keras_history'

I have read on the internet that keras.contrib is depricated but I don't know any other way how to use a CRF layer on top of BERT. If there is a better way of doing it in keras then please suggest me.

I don't know whether this question is making sense or not but any help would be appreciated. Thanks in advance!

Upvotes: 2

Views: 1883

Answers (1)

Jayesh Bankoti
Jayesh Bankoti

Reputation: 91

The easiest way is to use the CRF layer of the TensorFlow addons. Then utilize the output of that to calculate the loss.

import tensorflow_addons as tfa
crf = tfa.layers.CRF(len(num_labels)+1)

Further, you can utilize it by creating your own Model class too for model creation.

from tensorflow_addons.text.crf import crf_log_likelihood

def unpack_data(data):
    if len(data) == 2:
        return data[0], data[1], None
    elif len(data) == 3:
        return data
        raise TypeError("Expected data to be a tuple of size 2 or 3.")

class ModelWithCRFLoss(tf.keras.Model):
    """Wrapper around the base model for custom training logic."""

    def __init__(self, base_model):
        self.base_model = base_model

    def call(self, inputs):
        return self.base_model(inputs)

    def compute_loss(self, x, y, sample_weight, training=False):
        y_pred = self(x, training=training)
        _, potentials, sequence_length, chain_kernel = y_pred

        # we now add the CRF loss:
        crf_loss = -crf_log_likelihood(potentials, y, sequence_length, chain_kernel)[0]

        if sample_weight is not None:
            crf_loss = crf_loss * sample_weight

        return tf.reduce_mean(crf_loss), sum(self.losses)

    def train_step(self, data):
        x, y, sample_weight = unpack_data(data)

        with tf.GradientTape() as tape:
            crf_loss, internal_losses = self.compute_loss(
                x, y, sample_weight, training=True
            total_loss = crf_loss + internal_losses

        gradients = tape.gradient(total_loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        return {"crf_loss": crf_loss, "internal_losses": internal_losses}

    def test_step(self, data):
        x, y, sample_weight = unpack_data(data)
        crf_loss, internal_losses = self.compute_loss(x, y, sample_weight)
        return {"crf_loss_val": crf_loss, "internal_losses_val": internal_losses}

You can write along these lines of code

decoded_sequence, potentials, sequence_length, chain_kernel = crf(dense_layer_output, mask=attention_mask)

base_model = tf.keras.Model(
       inputs=[input_ids, attention_mask],

model = ModelWithCRFLoss(base_model)
      optimizer=tf.keras.optimizers.Adam(learning_rate=5e-3, epsilon=1e-08),

Upvotes: 1

Related Questions