ahp
ahp

Reputation: 1

batch size is smaller than I specified in keras

I write a data generator for a large dataset that contains mat files for Keras.

here is my code which I try to solve 3 classes problem that their data are in different folders(one, two, three) and in each batch will fill randomly from this folders.

def generate_arrays_from_file(path,nc1,nc2,nc3):
    while True:
        for line in range(batch_size):
            Data,y=fetch_data(path,nc1,nc2,nc3)
            yield (Data, y)

def fetch_data(path,nc1,nc2,nc3):
    trainData = numpy.empty(shape=[batch_size,img_rows, img_cols ])
    y = []
    for line in range(batch_size):
        labelClass = random.randint(0, 2)
        if labelClass == 0:
            random_num = random.randint(1, nc1)
            file_name = path + '/' + 'one/one-' + str(random_num) + '.mat'
        elif labelClass == 1:
            random_num = random.randint(1, nc2)
            file_name = path + '/' + 'two/two-' + str(random_num) + '.mat'
        else:
            random_num = random.randint(1, nc3)
            file_name = path + '/' + 'three/three-' + str(random_num) + '.mat'

        matfile = h5py.File(file_name)
        x = matfile['data']
        x = numpy.transpose(x.value, axes=(1, 0))

        trainData[line,:,:]=x

        y.append(labelClass)

    trainData = trainData.reshape(trainData.shape[0], img_rows, img_cols, 1)

    return trainData,y

this code is working but batch_size is set 16 but the output of keras is like this

  1/50000 [..............................] - ETA: 65067s - loss: 1.1666 - acc: 0.2500
    2/50000 [..............................] - ETA: 34057s - loss: 1.4812 - acc: 0.2188
    3/50000 [..............................] - ETA: 24202s - loss: 1.6554 - acc: 0.1875
    4/50000 [..............................] - ETA: 18799s - loss: 1.5569 - acc: 0.2344
    5/50000 [..............................] - ETA: 15611s - loss: 1.4662 - acc: 0.2625
    6/50000 [..............................] - ETA: 13863s - loss: 1.4563 - acc: 0.2500
    8/50000 [..............................] - ETA: 10978s - loss: 1.3903 - acc: 0.2734
    9/50000 [..............................] - ETA: 10402s - loss: 1.3595 - acc: 0.2778
   10/50000 [..............................] - ETA: 10253s - loss: 1.3333 - acc: 0.2875
   11/50000 [..............................] - ETA: 10389s - loss: 1.3195 - acc: 0.2784
   12/50000 [..............................] - ETA: 10411s - loss: 1.3063 - acc: 0.2760
   13/50000 [..............................] - ETA: 10360s - loss: 1.2896 - acc: 0.2788
   14/50000 [..............................] - ETA: 10424s - loss: 1.2772 - acc: 0.2768
   15/50000 [..............................] - ETA: 10464s - loss: 1.2660 - acc: 0.2750
   16/50000 [..............................] - ETA: 10483s - loss: 1.2545 - acc: 0.2852
   17/50000 [..............................] - ETA: 10557s - loss: 1.2446 - acc: 0.3015

which it seems batch_size isn't considered. can you tell why? thank you.

Upvotes: 0

Views: 749

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86600

Each step in train_generator (code not shown in the question) is a batch.

So:

  • The batch size is defined by the generator - But it's not shown in the printed output.
  • The steps_per_epoch parameter passed to fit_generator is how many batches will be drawn from the generator. Each step (or batch) is printed in that output.
  • The epochs parameter will define how many times it will repeat everything.

It's clear in the output that you chose steps_per_epoch = 50000. So, it assumes you're going to train 50000 batches. It will retrieve 50000 batches from the generator. (But the size of the batch is defined by the generator).

Checking the batch size:

There are two possible ways to check your batch size:

  • Get one sample from your generator and check it's length
  • Create a callback that prints you the logs

From generator:

generator = generate_arrays_from_file(path,nc1,nc2,nc3)
generatorSampleX, generatorSampleY = generator.next() #or next(generator)
print(generatorSampleX.shape)

#this will set the generator to the second element, so, it would be good to create the generator again before giving it to training

From callback:

from keras.callbacks import LambdaCallback

callback = LambdaCallback(on_batch_end=lambda batch,logs:print(logs))
model.fit_generator(........, callbacks = [callback])

Upvotes: 2

Related Questions