batch size is smaller than I specified in keras

Question

I write a data generator for a large dataset that contains mat files for Keras.

here is my code which I try to solve 3 classes problem that their data are in different folders(one, two, three) and in each batch will fill randomly from this folders.

def generate_arrays_from_file(path,nc1,nc2,nc3):
    while True:
        for line in range(batch_size):
            Data,y=fetch_data(path,nc1,nc2,nc3)
            yield (Data, y)

def fetch_data(path,nc1,nc2,nc3):
    trainData = numpy.empty(shape=[batch_size,img_rows, img_cols ])
    y = []
    for line in range(batch_size):
        labelClass = random.randint(0, 2)
        if labelClass == 0:
            random_num = random.randint(1, nc1)
            file_name = path + '/' + 'one/one-' + str(random_num) + '.mat'
        elif labelClass == 1:
            random_num = random.randint(1, nc2)
            file_name = path + '/' + 'two/two-' + str(random_num) + '.mat'
        else:
            random_num = random.randint(1, nc3)
            file_name = path + '/' + 'three/three-' + str(random_num) + '.mat'

        matfile = h5py.File(file_name)
        x = matfile['data']
        x = numpy.transpose(x.value, axes=(1, 0))

        trainData[line,:,:]=x

        y.append(labelClass)

    trainData = trainData.reshape(trainData.shape[0], img_rows, img_cols, 1)

    return trainData,y

this code is working but batch_size is set 16 but the output of keras is like this

  1/50000 [..............................] - ETA: 65067s - loss: 1.1666 - acc: 0.2500
    2/50000 [..............................] - ETA: 34057s - loss: 1.4812 - acc: 0.2188
    3/50000 [..............................] - ETA: 24202s - loss: 1.6554 - acc: 0.1875
    4/50000 [..............................] - ETA: 18799s - loss: 1.5569 - acc: 0.2344
    5/50000 [..............................] - ETA: 15611s - loss: 1.4662 - acc: 0.2625
    6/50000 [..............................] - ETA: 13863s - loss: 1.4563 - acc: 0.2500
    8/50000 [..............................] - ETA: 10978s - loss: 1.3903 - acc: 0.2734
    9/50000 [..............................] - ETA: 10402s - loss: 1.3595 - acc: 0.2778
   10/50000 [..............................] - ETA: 10253s - loss: 1.3333 - acc: 0.2875
   11/50000 [..............................] - ETA: 10389s - loss: 1.3195 - acc: 0.2784
   12/50000 [..............................] - ETA: 10411s - loss: 1.3063 - acc: 0.2760
   13/50000 [..............................] - ETA: 10360s - loss: 1.2896 - acc: 0.2788
   14/50000 [..............................] - ETA: 10424s - loss: 1.2772 - acc: 0.2768
   15/50000 [..............................] - ETA: 10464s - loss: 1.2660 - acc: 0.2750
   16/50000 [..............................] - ETA: 10483s - loss: 1.2545 - acc: 0.2852
   17/50000 [..............................] - ETA: 10557s - loss: 1.2446 - acc: 0.3015

which it seems batch_size isn't considered. can you tell why? thank you.

Daniel M&#246;ller · Accepted Answer

Each step in train_generator (code not shown in the question) is a batch.

So:

The batch size is defined by the generator - But it's not shown in the printed output.
The steps_per_epoch parameter passed to fit_generator is how many batches will be drawn from the generator. Each step (or batch) is printed in that output.
The epochs parameter will define how many times it will repeat everything.

It's clear in the output that you chose steps_per_epoch = 50000. So, it assumes you're going to train 50000 batches. It will retrieve 50000 batches from the generator. (But the size of the batch is defined by the generator).

Checking the batch size:

There are two possible ways to check your batch size:

Get one sample from your generator and check it's length
Create a callback that prints you the logs

From generator:

generator = generate_arrays_from_file(path,nc1,nc2,nc3)
generatorSampleX, generatorSampleY = generator.next() #or next(generator)
print(generatorSampleX.shape)

#this will set the generator to the second element, so, it would be good to create the generator again before giving it to training

From callback:

from keras.callbacks import LambdaCallback

callback = LambdaCallback(on_batch_end=lambda batch,logs:print(logs))
model.fit_generator(........, callbacks = [callback])

batch size is smaller than I specified in keras

Answers (1)

Related Questions