Bob John
Bob John

Reputation: 1

tensorflow network didn't fit

I have a pandas dataset where col1 -> input text(text tokenize with pre-trained tokenizer),col2 -> binary classification [0,1]. translating it into tensorflow dataset

dataset = tf.data.Dataset.from_generator(lambda: dataset, output_types=(tf.string, tf.int32))

create model

def build_classifier_model():
   text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
   preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')
   encoder_inputs = preprocessing_layer(text_input)
   encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='BERT_encoder')
   outputs = encoder(encoder_inputs)
   net = outputs['pooled_output']
   net = tf.keras.layers.Dropout(0.2)(net)
   net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
   return tf.keras.Model(text_input, net)
classifier_model = build_classifier_model()

fine_tune model(bert)

epochs = 5
steps_per_epoch = tf.data.experimental.cardinality(dataset).numpy()
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)

init_lr = 3e-5 
optimizer = optimization.create_optimizer(init_lr=init_lr,
                                      num_train_steps=num_train_steps,
                                      num_warmup_steps=num_warmup_steps,
                                      optimizer_type='adamw')

model compile

classifier_model.compile(optimizer=optimizer,
                     loss=loss,
                     metrics=metrics)

and i begin fit model with loop

from tqdm import tqdm
for epoch in range(5):
    for step, (x_batch_train, y_batch_train) in tqdm(enumerate(dataset)):
        with tf.GradientTape() as tape:
        logits = classifier_model(x_batch_train, training=True) 
        loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, classifier_model.trainable_weights)
        optimizer.apply_gradients(zip(grads, classifier_model.trainable_weights))
        print(step)
        if step % 200 == 0:
            print('loss_value %s: %s' % (step, float(loss_value)))

i run this in colab pro with gpu and the cell where this training is performed freezes and does not train the model. Output:

0it [00:00, ?it/s]

Help me begin to train my model,please(when i try to fit method .fit (model.fit()) the result was the same)

Upvotes: 0

Views: 113

Answers (1)

Davide Anghileri
Davide Anghileri

Reputation: 931

I think the problem is in the way you create your Dataset. The from_generator function expect a generator as first argument and not a function.

Try accessing an element or ispecting it to see if it works. Try for example:

next(iter(dataset))

and see if it returns the first sample and label.

When you create a tensorflow Dataset from a pandas DataFrame you should use tf.data.Dataset.from_tensor_slices(), here an example:

dataset = tf.data.Dataset.from_tensor_slices(dict(dataset))

Upvotes: 0

Related Questions