SaigyoujiYuyuko
SaigyoujiYuyuko

Reputation: 53

Best practice of loading a huge image dataset for ML

I'm playing around with a image dataset in kanggle (https://www.kaggle.com/competitions/paddy-disease-classification/data). In this dataset, there are about 10000 images with 480*640 resolution.
When I try to load this dataset by following code,

for (label, file) in dataset_file_img(dataset_path)
    image = load_img_into_tensor(file)
    data.append(image/255)
    data_label.append(label)

it consume about 20GB of RAM.

What is the best practice of loading a dataset like this?
Any help will/would be appreciated!

Upvotes: 1

Views: 901

Answers (2)

münsteraner
münsteraner

Reputation: 11

If you don't have enough GPU computing power, ImageDataGenerator will probably give you a bottleneck. As suggested by Shubham, try to use tf.data, which is the best option as far as I know.

Upvotes: 0

s510
s510

Reputation: 2822

Try the following from keras:

  1. ImageDataGenerator here

  2. image_dataset_from_directory function here

Upvotes: 1

Related Questions