Reputation: 1
I write a data generator for a large dataset that contains mat files for Keras.
here is my code which I try to solve 3 classes problem that their data are in different folders(one, two, three) and in each batch will fill randomly from this folders.
def generate_arrays_from_file(path,nc1,nc2,nc3):
while True:
for line in range(batch_size):
Data,y=fetch_data(path,nc1,nc2,nc3)
yield (Data, y)
def fetch_data(path,nc1,nc2,nc3):
trainData = numpy.empty(shape=[batch_size,img_rows, img_cols ])
y = []
for line in range(batch_size):
labelClass = random.randint(0, 2)
if labelClass == 0:
random_num = random.randint(1, nc1)
file_name = path + '/' + 'one/one-' + str(random_num) + '.mat'
elif labelClass == 1:
random_num = random.randint(1, nc2)
file_name = path + '/' + 'two/two-' + str(random_num) + '.mat'
else:
random_num = random.randint(1, nc3)
file_name = path + '/' + 'three/three-' + str(random_num) + '.mat'
matfile = h5py.File(file_name)
x = matfile['data']
x = numpy.transpose(x.value, axes=(1, 0))
trainData[line,:,:]=x
y.append(labelClass)
trainData = trainData.reshape(trainData.shape[0], img_rows, img_cols, 1)
return trainData,y
this code is working but batch_size is set 16 but the output of keras is like this
1/50000 [..............................] - ETA: 65067s - loss: 1.1666 - acc: 0.2500
2/50000 [..............................] - ETA: 34057s - loss: 1.4812 - acc: 0.2188
3/50000 [..............................] - ETA: 24202s - loss: 1.6554 - acc: 0.1875
4/50000 [..............................] - ETA: 18799s - loss: 1.5569 - acc: 0.2344
5/50000 [..............................] - ETA: 15611s - loss: 1.4662 - acc: 0.2625
6/50000 [..............................] - ETA: 13863s - loss: 1.4563 - acc: 0.2500
8/50000 [..............................] - ETA: 10978s - loss: 1.3903 - acc: 0.2734
9/50000 [..............................] - ETA: 10402s - loss: 1.3595 - acc: 0.2778
10/50000 [..............................] - ETA: 10253s - loss: 1.3333 - acc: 0.2875
11/50000 [..............................] - ETA: 10389s - loss: 1.3195 - acc: 0.2784
12/50000 [..............................] - ETA: 10411s - loss: 1.3063 - acc: 0.2760
13/50000 [..............................] - ETA: 10360s - loss: 1.2896 - acc: 0.2788
14/50000 [..............................] - ETA: 10424s - loss: 1.2772 - acc: 0.2768
15/50000 [..............................] - ETA: 10464s - loss: 1.2660 - acc: 0.2750
16/50000 [..............................] - ETA: 10483s - loss: 1.2545 - acc: 0.2852
17/50000 [..............................] - ETA: 10557s - loss: 1.2446 - acc: 0.3015
which it seems batch_size isn't considered. can you tell why? thank you.
Upvotes: 0
Views: 749
Reputation: 86600
Each step
in train_generator
(code not shown in the question) is a batch.
So:
generator
- But it's not shown in the printed output. steps_per_epoch
parameter passed to fit_generator is how many batches will be drawn from the generator. Each step (or batch) is printed in that output. epochs
parameter will define how many times it will repeat everything.It's clear in the output that you chose steps_per_epoch = 50000
. So, it assumes you're going to train 50000 batches. It will retrieve 50000 batches from the generator. (But the size of the batch is defined by the generator).
Checking the batch size:
There are two possible ways to check your batch size:
From generator:
generator = generate_arrays_from_file(path,nc1,nc2,nc3)
generatorSampleX, generatorSampleY = generator.next() #or next(generator)
print(generatorSampleX.shape)
#this will set the generator to the second element, so, it would be good to create the generator again before giving it to training
From callback:
from keras.callbacks import LambdaCallback
callback = LambdaCallback(on_batch_end=lambda batch,logs:print(logs))
model.fit_generator(........, callbacks = [callback])
Upvotes: 2