Progressive loading of large arbitrary datasets in keras

Question

I'm training my keras dense models on very large datasets.

For practical reasons, I am saving them on my disk on separate .txt files. I have 1e4 text files, each containing 1e4 examples.

I would like to find a way to fit my keras model on this dataset as a whole. For now, I am only able to use "model.fit" on individual text files, i.e. :

for k in range(10000):
     X = np.loadtxt('/path/X_'+str(k)+'.txt')
     Y = np.loadtxt('/path/Y_'+str(k)+'.txt')
     mod = model.fit(x=X, y=Y, batch_size=batch_size, epochs=epochs)

Which is problematic if I want for instance to perform several epochs on the whole datasets.

Ideally, I would like to have a dataloader function that could be used in the following way to feed all the sub-datasets as a single one:

mod = model.fit(dataloader('/path/'), batch_size=batch_size, epochs=epochs)

I think I found what I want, but only for datasets composed of images: tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory

Is there any tf/keras function doing something similar, but for datasets which are not composed of images?

Thanks!

Progressive loading of large arbitrary datasets in keras

Answers (1)

Related Questions