Adelov
Adelov

Reputation: 395

Batch size of 1 in custom data generator in Keras

I made this generator to generate data from 4 different datasets (dir1, dir2, dir3, dir4), but when I choose a batch size more than 1 I get this error:

Traceback (most recent call last):
  File "cnn.py", line 201, in <module>
    callbacks = [cb_checkpointer, cb_early_stopper, reduce_lr])
  File "/home/user/.local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training_generator.py", line 220, in fit_generator
    reset_metrics=False)
  File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1508, in train_on_batch
    class_weight=class_weight)
  File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 637, in _standardize_user_data
    training_utils.check_array_length_consistency(x, y, sample_weights)
  File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training_utils.py", line 235, in check_array_length_consistency
    str([x.shape for x in inputs]))
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(1, 224, 224, 3), (2, 224, 224, 3), (2, 224, 224, 3), (2, 224, 224, 3)]

My code (generate batch for training a CNN on 4 differents datasets):

def generate_generator_multiple(generator,dir1, dir2, dir3, dir4, batch_size, img_height,img_width):
    genX1 = generator.flow_from_directory(dir1,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)


    genX2 = generator.flow_from_directory(dir2,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)

    genX3 = generator.flow_from_directory(dir3,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)

    genX4 = generator.flow_from_directory(dir4,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False) 
    while True:
        X1i = genX1.next()
        X2i = genX2.next()
        X3i = genX3.next()
        X4i = genX4.next()
        yield [X1i[0], X2i[0],X3i[0],X4i[0]], X2i[1]  #Yield both images and their mutual label

Upvotes: 0

Views: 932

Answers (2)

Adelov
Adelov

Reputation: 395

I found the solution, the problem was generated because the last step of training wasn't generating the same number of samples, maybe the problem with the step per epochs or something like that, because the 4 datasets are not balanced, so I add on If test on the code like this:

def generate_generator_multiple(generator,dir1, dir2, dir3, dir4, batch_size, img_height,img_width):
    genX1 = generator.flow_from_directory(dir1,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)


    genX2 = generator.flow_from_directory(dir2,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)

    genX3 = generator.flow_from_directory(dir3,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False)

    genX4 = generator.flow_from_directory(dir4,
                                  target_size = (img_height,img_width),
                                  class_mode = 'categorical',
                                  batch_size = batch_size,
                                  shuffle=False) 
    while True:
        X1i = genX1.next()
        X2i = genX2.next()
        X3i = genX3.next()
        X4i = genX4.next()
        if ((len(X1i[1]) == batch_size) and (len(X2i[1]) == batch_size) and (len(X3i[1]) == batch_size) and (len(X4i[1]) == batch_size)):
            yield [X1i[0], X2i[0],X3i[0],X4i[0]], X2i[1]  #Yield both images and their mutual label

Upvotes: 0

Raunak Shamnani
Raunak Shamnani

Reputation: 81

It looks like the number of examples in X1i[0] doesn't match with the number of examples in X1i[1]. Can you double check it? And, in the future, please attach the full error since it helps in debugging the issue.

Upvotes: 1

Related Questions