How to use Keras datasets to create a bi-modal ANN for classifying 2 types of images?

Question

I'm trying to design a 2 channel ANN to get 2 types of images as inputs and classify them into 5 classes:

spc_trn_dir/
...class_1/
......image_1.jpg
......image_2.jpg
...class_2/
......image_11.jpg
......image_12.jpg
...class_3/
......image_21.jpg
......image_22.jpg
...class_4/
......image_31.jpg
......image_32.jpg
...class_5/
......image_41.jpg
......image_42.jpg

scl_trn_dir/
...class_1/
......image_51.jpg
......image_52.jpg
...class_2/
......image_61.jpg
......image_62.jpg
...class_3/
......image_71.jpg
......image_72.jpg
...class_4/
......image_81.jpg
......image_82.jpg
...class_5/
......image_91.jpg
......image_92.jpg

I know that I can use NumPy arrays to feed the ANN with the data, but since the sizes of the datasets are 8700 of 150x150 images each, doing that would result in all RAM memory consumption and crash. Hence, I have to use Keras datasets.

I designed the following network:

Model: "model_9"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 img_input_1 (InputLayer)       [(None, 150, 150, 3  0           []                               
                                )]                                                                
                                                                                                  
 img_input_2 (InputLayer)       [(None, 150, 150, 3  0           []                               
                                )]                                                                
                                                                                                  
 model_2 (Functional)           (None, 1024)         2730040     ['img_input_1[0][0]']            
                                                                                                  
 model_3 (Functional)           (None, 1024)         2730040     ['img_input_2[0][0]']            
                                                                                                  
 concatenate_10 (Concatenate)   (None, 2048)         0           ['model_2[11][0]',               
                                                                  'model_3[11][0]']               
                                                                                                  
 dense_9 (Dense)                (None, 5)            10245       ['concatenate_10[0][0]']         
                                                                                                  
==================================================================================================
Total params: 5,470,325
Trainable params: 5,453,749
Non-trainable params: 16,576
__________________________________________________________________________________________________

And created the datasets based on the instructions in Training & evaluation with the built-in methods and Tensorflow page for tf.data.Dataset.

image_size = (150, 150)
batch_size = 128

spc_train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    spc_trn_dir,
    image_size=image_size,
    batch_size=batch_size,
)


scl_train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    scl_trn_dir,
    image_size=image_size,
    batch_size=batch_size,
)

train_ds = tf.data.Dataset.zip((spc_train_ds,scl_train_ds))
val_ds = tf.data.Dataset.zip((spc_val_ds,scl_val_ds))

To elaborate how the ds works:

(spc_img, spc_lbl),( scl_img,  scl_lbl)= next(iter(train_ds))
print(f'shapes: image batch: {spc_img.shape} , labels: {spc_lbl.shape}')
print(f'shapes: image batch: {scl_img.shape} , labels: {scl_lbl.shape}')

shapes: image batch: (128, 150, 150, 3) , labels: (128,)

When I try to fit the model to the dataset, it gives me the following Warning and error:

multi_modal_model.fit(train_ds, epochs=1, validation_data=val_ds)

WARNING:tensorflow:Model was constructed with shape (None, 150, 150, 3) for input KerasTensor(type_spec=TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name='img_input_2'), name='img_input_2', description="created by layer 'img_input_2'"), but it was called on an input with incompatible shape (None,). WARNING:tensorflow:Model was constructed with shape (None, 150, 150, 3) for input KerasTensor(type_spec=TensorSpec(shape=(None, 150, 150, 3), dtype=tf.float32, name='input_6'), name='input_6', description="created by layer 'input_6'"), but it was called on an input with incompatible shape (None,).

    ValueError: Exception encountered when calling layer "model_3" (type Functional).
    
    Input 0 of layer "conv2d_12" is incompatible with the layer: expected min_ndim=4, found ndim=1. Full shape received: (None,)
    
    Call arguments received by layer "model_3" (type Functional):
      • inputs=tf.Tensor(shape=(None,), dtype=float32)
      • training=True
      • mask=None

I understand that it's taking the label of the 1st channel as the input to the 2nd one, but don't know how to change it to the correct way, so any help with that is appreciated. Also, If you could guide me to any smarter way of defining my datasets for this 2 channel ANN, I would be grateful.

Mohammad Ahmed · Accepted Answer

The problem is when you are calling the fit function; you are passing the wrong shape inputs to the model, I am creating your model in your case kindly check it and let me know if it works...

#Now, declare your inputs
x1 = tf.random.normal((1,150,150,3))
x2 = tf.random.normal((1,150,150,3))

y1 = tf.random.uniform((1,1), minval=0 , maxval=5, dtype=tf.int32)
y2 = tf.random.uniform((1,1), minval=0 , maxval=5, dtype=tf.int32)

#Now here you have two datasets having same labels with different train_X
#You don't need to zip them together, what you have to do is to rearrange them
#together, so here I am making a identity function which will rearrange the 
#inputs, so, here I have two datasets like yours having x's and y's
#So, do the same as I did...
dataset1 = tf.data.Dataset.from_tensor_slices((x1, y1))
dataset2 = tf.data.Dataset.from_tensor_slices((x2, y2))

identity_map_x = lambda x , y : (x)
identity_map_y = lambda x , y : (y)

x1 = dataset1.map(identity_map_x, num_parallel_calls=tf.data.AUTOTUNE)
x2 = dataset2.map(identity_map_x, num_parallel_calls=tf.data.AUTOTUNE)

y = dataset1.map(identity_map_y , num_parallel_calls=tf.data.AUTOTUNE)

data = tf.data.Dataset.zip((x1,x2))

dataset = tf.data.Dataset.zip((data , y))

dataset = dataset.batch(1)
#Your optimizer
model.compile(optimizer='Adam', loss='sparse_categorical_crossentropy')

#Try to fit your function
model.fit(dataset)

Output:

1/1 [==============================] - 2s 2s/step - loss: 9.5816

How to use Keras datasets to create a bi-modal ANN for classifying 2 types of images?

Answers (1)

Related Questions