loss problem in training a multi-modal model

Question

I have a dataset that contains images and texts. I use the Bert model for text input and resnet50 for image input. Now When I want to train this model I get an error. please help me.

I load my data using a generator. this is my data generator function:

def multi_modal_data_generator(df, labels, image_datagen, batch_size):
    .
    .
    yield {'image_input': images,
                   'input_ids': np.array(input_ids),
                   'token_type_ids': np.array(token_type_ids),
                   'attention_mask': np.array(attention_mask)}, {'output': 
    np.array(lbls)}

this is my model:

def image_text_model():

  image_input = Input(shape=(224, 224, 3), name='image_input')

  input_ids = Input(shape=(96,), dtype=tf.int32, name='input_ids')
  token_type_ids = Input(shape=(96,), dtype=tf.int32, name='token_type_ids')
  attention_mask = Input(shape=(96,), dtype=tf.bool, name='attention_mask')

  resnet_model = ResNet50(weights='imagenet', include_top=False)
  for layer in resnet_model.layers:
        layer.trainable = False
  image_output = resnet_model(image_input)
  image_output = GlobalAveragePooling2D()(image_output)

  # Use the BERT model to extract features from the text input
  text_features = bert_model(input_ids, attention_mask, token_type_ids)['logits']

  concatenated = tf.keras.layers.Concatenate()([
      image_output,
      text_features
  ])

  # Concatenate the image features and text features
  # concatenated = Concatenate()([image_output, text_features])

  # Add a Dense layer
  x = Dense(units=128, activation='relu')(concatenated)

  # Add the output Dense layer with softmax activation for classification
  output = Dense(num_classes, activation='softmax')(x)

  # Define the model
  model = tf.keras.Model(inputs=[image_input, input_ids,  attention_mask, token_type_ids], outputs=[output])
  return model

this is my compile and fit function:

new_model = image_text_model()
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.CategoricalCrossentropy()
new_model.fit(train_generator, epochs=2)

this is my error:

ValueError: Found unexpected losses or metrics that do not correspond to any Model output: dict_keys(['output']). Valid mode output names: ['dense_18']. Received struct is: {'output': }.

loss problem in training a multi-modal model

Answers (0)

Related Questions