Reputation: 51
Learning to use bert-base-cased and a classification model... the code for the model is the following:
def mao_func(input_ids, masks, labels):
return {'input_ids':input_ids, 'attention_mask':masks}, labels
dataset = dataset.map(mao_func)
BATCH_SIZE = 32
dataset = dataset.shuffle(100000).batch(BATCH_SIZE)
split = .8
ds_len = len(list(dataset))
train = dataset.take(round(ds_len * split))
val = dataset.skip(round(ds_len * split))
from transformers import TFAutoModel
bert = TFAutoModel.from_pretrained('bert-base-cased')
Model: "tf_bert_model"
bert (TFBertMainLayer) multiple 108310272
================================================================= Total params: 108,310,272 Trainable params: 108,310,272 Non-trainable params: 0
then the NN builduing:
input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(50,), name='attention_mask', dtype='int32')
embeddings = bert(input_ids, attention_mask=mask)[0]
X = tf.keras.layers.GlobalMaxPool1D()(embeddings)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(128, activation='relu')(X)
X = tf.keras.layers.Dropout(0.1)(X)
X = tf.keras.layers.Dense(32, activation='relu')(X)
y = tf.keras.layers.Dense(3, activation='softmax',name='outputs')(X)
model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)
model.layers[2].trainable = False
the model.summary is:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 50)] 0 []
attention_mask (InputLayer) [(None, 50)] 0 []
tf_bert_model (TFBertModel) TFBaseModelOutputWi 108310272 ['input_ids[0][0]',
thPoolingAndCrossAt 'attention_mask[0][0]']
tentions(last_hidde
n_state=(None, 50,
768),
pooler_output=(Non
e, 768),
past_key_values=No
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
=None)
global_max_pooling1d (GlobalMa (None, 768) 0 ['tf_bert_model[0][0]']
xPooling1D)
batch_normalization (BatchNorm (None, 768) 3072 ['global_max_pooling1d[0][0]']
alization)
dense (Dense) (None, 128) 98432 ['batch_normalization[0][0]']
dropout_37 (Dropout) (None, 128) 0 ['dense[0][0]']
dense_1 (Dense) (None, 32) 4128 ['dropout_37[0][0]']
outputs (Dense) (None, 3) 99 ['dense_1[0][0]']
==================================================================================================
Total params: 108,416,003
Trainable params: 104,195
Non-trainable params: 108,311,808
__________________________________________________________________________________________________
finally the model fitting is
optimizer = tf.keras.optimizers.Adam(0.01)
loss = tf.keras.losses.CategoricalCrossentropy()
acc = tf.keras.metrics.CategoricalAccuracy('accuracy')
model.compile(optimizer,loss=loss, metrics=[acc])
history = model.fit(
train,
validation_data = val,
epochs=140
)
with execution error in line 7 -> the model.fit(...):
ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 50), found shape=(None, 1, 512)
Can any one be so kind of helping me on what I did wrong and why... thanks:)
update: here is the git with the codes https://github.com/CharlieArreola/OnlinePosts
Upvotes: 4
Views: 10188
Reputation: 802
It seems, that your shape of the train data doen't match the expected input shape of your input layer.
You can check your shape of the train data with train.shape()
You input layer Input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32')
expects train data with 50 columns, but you most likely have 512 if we look at your error.
So to fix this, you could simply change your input shape.
Input_ids = tf.keras.layers.Input(shape=(512,), name='input_ids', dtype='int32')
If you split your x and y in your dataset you can make it more flexible with:
Input_ids = tf.keras.layers.Input(shape=(train_x.shape[0],), name='input_ids', dtype='int32')
Also don't forget, that you have to do this change to all of your input layers!
Upvotes: 4