Reputation: 739
I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-d9f382cba5d4> in <module>()
----> 1 model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))
5 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/data_adapter.py in
__init__(self, x, y, sample_weights, batch_size, epochs, steps, shuffle, **kwargs)
243 label, ", ".join([str(i.shape[0]) for i in nest.flatten(data)]))
244 msg += "Please provide data which shares the same first dimension."
--> 245 raise ValueError(msg)
246 num_samples = num_samples.pop()
247
ValueError: Data cardinality is ambiguous:
x sizes: 3
y sizes: 6102
Please provide data which shares the same first dimension.
I am referring the medium article called Simple BERT using TensorFlow 2.0 The git repo for the library bert-for-tf2 can be found here.
Please find the entire code here.
Here is a link to my colab notebook
Really appreciate your help!
Upvotes: 7
Views: 20804
Reputation: 11
Had the same issue, dunno why number of inputs and outputs should be same, this error appears to be raised from one of the data adaptors when x.shape[0] != y.shape[0], in this case
x = [INPUT_IDS,INPUT_MASKS,INPUT_SEGS]
y = list(train.SECTION)
so instead of
model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))
try giving inputs and outputs in a dictionary with the layer names (check model summary (suitable names can be explicitly given as well)), worked for me
model.fit(
{
"input_word_ids": INPUT_IDS,
"input_mask": INPUT_MASKS,
"segment_ids": INPUT_SEGS,
},
{"dense_1": list(train.SECTION)}
)
please make sure that the inputs and outputs are numpy arrays, for ex: using np.asarray(), it looks for .shape attribute
Upvotes: 1