Reputation: 1
I have a pandas dataset
where col1
-> input text(text tokenize with pre-trained tokenizer),col2 -> binary classification [0,1].
translating it into tensorflow dataset
dataset = tf.data.Dataset.from_generator(lambda: dataset, output_types=(tf.string, tf.int32))
create model
def build_classifier_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')
encoder_inputs = preprocessing_layer(text_input)
encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='BERT_encoder')
outputs = encoder(encoder_inputs)
net = outputs['pooled_output']
net = tf.keras.layers.Dropout(0.2)(net)
net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
return tf.keras.Model(text_input, net)
classifier_model = build_classifier_model()
fine_tune model(bert)
epochs = 5
steps_per_epoch = tf.data.experimental.cardinality(dataset).numpy()
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)
init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
num_train_steps=num_train_steps,
num_warmup_steps=num_warmup_steps,
optimizer_type='adamw')
model compile
classifier_model.compile(optimizer=optimizer,
loss=loss,
metrics=metrics)
and i begin fit model with loop
from tqdm import tqdm
for epoch in range(5):
for step, (x_batch_train, y_batch_train) in tqdm(enumerate(dataset)):
with tf.GradientTape() as tape:
logits = classifier_model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, classifier_model.trainable_weights)
optimizer.apply_gradients(zip(grads, classifier_model.trainable_weights))
print(step)
if step % 200 == 0:
print('loss_value %s: %s' % (step, float(loss_value)))
i run this in colab pro with gpu and the cell where this training is performed freezes and does not train the model. Output:
0it [00:00, ?it/s]
Help me begin to train my model,please(when i try to fit method .fit
(model.fit()
)
the result was the same)
Upvotes: 0
Views: 113
Reputation: 931
I think the problem is in the way you create your Dataset. The from_generator
function expect a generator as first argument and not a function.
Try accessing an element or ispecting it to see if it works. Try for example:
next(iter(dataset))
and see if it returns the first sample and label.
When you create a tensorflow Dataset from a pandas DataFrame you should use tf.data.Dataset.from_tensor_slices(), here an example:
dataset = tf.data.Dataset.from_tensor_slices(dict(dataset))
Upvotes: 0