ValueError: Dimension must be 2 but is 3 (keras Attention)

Question

I'm trying to re-implement the PAtt-Lite implementation (https://github.com/JLREx/PAtt-Lite/blob/main/patt-lite-notebook.ipynb) with the FER-2013 dataset.

My data shapes (48x48 images, rgb):

Shape of X_train: (28709, 48, 48, 3)
Shape of y_train: (28709,)
Shape of X_valid: (3589, 48, 48, 3)
Shape of y_valid: (3589,)
Shape of X_test: (3589, 48, 48, 3)
Shape of y_test: (3589,)

(A overview of the steps in the model from PAtt-Lite):


inputs = input_layer
x = sample_resizing(inputs)
x = data_augmentation(x)
x = preprocess_input(x)
x = base_model(x, training=False)
x = patch_extraction(x)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(TRAIN_DROPOUT)(x)
x = pre_classification(x)
x = self_attention([x,x]) 

outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs, name='train-head')
model.compile(optimizer=keras.optimizers.Adam(learning_rate=TRAIN_LR, global_clipnorm=3.0), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

The problem comes when model.fit is called (which implemented the same as in PAtt-Lite). If self_attention is commented out, it works. If not, I receive the error:

Exception encountered when calling Attention.call().

\u001b[1mDimension must be 2 but is 3 for '{{node train-head_1/attention_1/transpose}} = Transpose[T=DT_FLOAT, Tperm=DT_INT32](train-head_1/pre_classification_1/batch_normalization_50_1/batchnorm/add_1, train-head_1/attention_1/transpose/perm)' with input shapes: [?,32], [3].\u001b[0m

Arguments received by Attention.call():
  • inputs=['tf.Tensor(shape=(None, 32), dtype=float32)', 'tf.Tensor(shape=(None, 32), dtype=float32)']
  • mask=['None', 'None']
  • training=True
  • return_attention_scores=False
  • use_causal_mask=False",

And I have no idea where [3] comes from and how to solve it. Can anybody help me?

Thanks in advance

I tried looking at the documentation https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention, which says it requires a query tensor of shape (batch_size, Tq, dim) and a value tensor of shape (batch_size, Tv, dim), but I don't understand what these values can be as the shape of x before calling self_attention is (None, 32)

I have also found this page ValueError: Dimension must be 4 but is 3 but it does not really answer my question I think, as it does not use tf.keras.layers.Attention

EDIT: I downgraded to tensorflow 2.10 and model.fit runs now

ValueError: Dimension must be 2 but is 3 (keras Attention)

Answers (0)

Related Questions