Keras: Generator runs out of data when starting second epoch

Question

I have the following generator:

def customGenerator(generator, indexes):

    for i in indexes:
        x,y = generator[i]
        yield (np.squeeze(x), 
                {'outputsA': y[:,4:6], 'outputsB': y[:,11:], 
                'outputsC': y[:,10]} )

and then lines to train the model (I am omitting some lines that are unrelated to the problem):

randomize = np.arange( len(generator) )
np.random.shuffle(randomize)
trainLimit = int( 0.9*len(generator) )

model.fit(x = customGenerator(generator, randomize[:trainLimit]), y = None,
    validation_data = customGenerator(generator, randomize[trainLimit:]),
    epochs=1000, steps_per_epoch = trainLimit)

Setting steps_per_epoch to None (or just removing this argument) produces the same error.

This code works well during the first epoch, but then when starting the second epoch it says it ran out of data:

Epoch 1/1000                                                                                                                                                                                                                                 
2534/2534 [==============================] - 1124s 443ms/step - loss: 20.3274 - outputsA_loss: 8.2611 - outputsB_loss: 11.8572 - outputsC_loss: 0.2091 - val_loss: 11.4947 - val_outputsA_loss
: 3.3958 - val_outputsB_loss: 7.9044 - val_outputsC_loss: 0.1945                                                                                                                              
Epoch 2/1000                                                                                                                                                                                  
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2534000
 batches). You may need to use the repeat() function when building your dataset.

This warning is not just a warning, it stops the execution completely.

it seems that it only runs through the generator once, while I thought it would re-start the generator each epoch.

I don't really know how to do this.

I could create an input array which is the original data repeated 1000 times, but this would use a lot of memory, and there has to be a way to tell it to re-start the generator in every iteration, but I don't know how.

Feodoran · Accepted Answer

The generator stops at the end of the for loop. To simply repeat the data, wrap the for loop in a while loop:

def customGenerator(generator, indexes):

    while True:

        indexes = np.random.shuffle(indexes) # reshuffle every new epoch

        for i in indexes:
            x,y = generator[i]
            yield (np.squeeze(x), 
                    {'outputsA': y[:,4:6], 'outputsB': y[:,11:], 
                    'outputsC': y[:,10]} )

Keras: Generator runs out of data when starting second epoch

Answers (1)

Related Questions