How to properly setup a data set for training a Keras model

Question

I am trying to create a dataset for audio recognition with a simple Keras sequential model.

This is the function I am using to create the model:

def dnn_model(input_shape, output_shape):
    model = keras.Sequential()
    model.add(keras.Input(input_shape))
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation = "relu"))
    model.add(layers.Dense(output_shape, activation = "softmax"))
    model.compile(  optimizer='adam',
                    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), 
                    metrics=['acc'])

    model.summary()
    
    return model

And I am Generating my trainingsdata with this Generator function:

def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters):
    window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)    
    window_size_samples = 2**tools.next_pow2(window_size_samples) 
    hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)

    for i in range(len(x_dirs)):
        features  = fe.compute_features_with_context(x_dirs[i],**parameters)
        praat = tools.praat_file_to_target( y_dirs[i],
                                            sampling_rate,
                                            window_size_samples,
                                            hop_size_samples,
                                            hmm)
        yield features,praat

The variables x_dirs and y_dirs contain a list of paths to labels and audiofiles. In total I got 8623 files to train my model. This is how I train my model:

def train_model(model, model_dir, x_dirs, y_dirs, hmm, sampling_rate, parameters, steps_per_epoch=10,epochs=10):

    model.fit((generator(x_dirs, y_dirs, hmm, sampling_rate, parameters)),
                            epochs=epochs,
                            batch_size=steps_per_epoch)
    return model

My problem now is that if I pass all 8623 files it will use all 8623 files to train the model in the first epoch and complain after the first epoch that it needs steps_per_epoch * epochs batches to train the model.

I tested this with only 10 of the 8623 files with a sliced list, but than Tensorflow complains that there are needed 100 batches.

So how do I have my Generator yield out data that its works best? I always thought that steps_per_epoch just limits the data received per epoch.

Thomas Schillaci · Accepted Answer

The fit function is going to exhaust your generator, that is to say, once it will have yielded all your 8623 batches, it wont be able to yield batches anymore.

You want to solve the issue like this:

def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters, epochs=1):
    for epoch in range(epochs):  # or while True:
        window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)    
        window_size_samples = 2**tools.next_pow2(window_size_samples) 
        hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)
        
        for i in range(len(x_dirs)):
            features  = fe.compute_features_with_context(x_dirs[i],**parameters)
            praat = tools.praat_file_to_target( y_dirs[i],
                                                sampling_rate,
                                                window_size_samples,
                                                hop_size_samples,
                                                hmm)
            yield features,praat

How to properly setup a data set for training a Keras model

Answers (1)

Related Questions