manoelpqueiroz
manoelpqueiroz

Reputation: 637

How to use keras image_dataset_from_directory with custom structures?

Keras image_dataset_from_directory inside the preprocessing module takes a path as an argument and automatically infers the classes when those images are stored in separate subfolders. In my case, however, I have a single folder and image classes are then specified in a DataFrame.

.
├── datasets
│   ├── sample_submit.csv
│   ├── test_images
│   │   ├── test_0000.jpg
│   │   ├── test_0001.jpg
│   │   ├── test_0002.jpg
│   │   └── ...
│   ├── test_images.csv
│   ├── train_images
│   │   ├── train_0000.jpg
│   │   ├── train_0001.jpg
│   │   ├── train_0002.jpg
│   │   └── ...
│   └── train_images.csv
└── model.py

Tensorflow's documentation specifies that when you are not inferring the labels, a list or tuple must be specified, which I get from the DataFrame df. However, when I specify the image folder, TensorFlow returns a ValueError because it has found no images:

In [1]: df = pd.read_csv('datasets/train_images.csv')
   ...: tds = keras.preprocessing\
   ...:    .image_dataset_from_directory('datasets/train_images', list(df['class']),
   ...:                                  validation_split=0.2, subset='training',
   ...:                                  seed=123, image_size(180, 180))

ValueError: Expected the lengths of `labels` to match the number of files in the target directory. len(labels) is 1102 while we found 0 files in datasets/train_images.

Why does keras not recognise the images within the folder? I have tried setting the "full" relative path with ./datasets/train_images, adding a slash with datasets/train_images/ and also the absolute path, to no avail. What is missing here? Alternatively, is there a more efficient approach in this case where I can still get the train/test split?


EDIT: It seems there is a limitation with keras and this question originally laid it out, but remained too vague to get to the heart of the matter.

Plain and clear: keras seems to always scrape the subfolders of the directory argument for images and build the dataset. The workaround to enable the loading of images is to wrap an additional folder (outer_train) and pass it to directory.

However, I still have problems with this approach, because now keras seems unable to take the custom classes passed as a list and outputs Found 1102 files belonging to 1 classes. (in this case, the name of the now subfolder train_images), so any help is still appreciated.

Upvotes: 2

Views: 1886

Answers (1)

Simone Starace
Simone Starace

Reputation: 166

keras seems to always scrape the subfolders of the directory argument for images and build the dataset. The workaround to enable the loading of images is to wrap an additional folder (outer_train) and pass it to directory.

The problem is the image_dataset_from_directory method asks a directory which contains other directories and starts getting the images inside the directories present from the directory you gave in input.

However, I still have problems with this approach, because now keras seems unable to take the custom classes passed as a list and outputs.

I don't think you can read images like that. If you want the method to read the images of a custom class then you have to place the folder with the custom class of images inside the folder you want to read like this:

  • directory_to_read/

    • class__1/

      • img1

      • img2

    • class__2/

      • img1

      • img2

    • custom_class/

      • img1

      • img2

Upvotes: 2

Related Questions