Todayisagreatday
Todayisagreatday

Reputation: 43

Loading images in unlabelled subfolders using Keras's ImageDataGenerator

I'm having trouble trying to load the Adience image dataset using Keras's ImageDataGenerator. The image data (.jpg) is in multiple subfolders that are named as user IDs.

directory/
     7153718@N04/
           landmark_aligned_face.2282.11597961815_4916cbf003_o.jpg
           landmark_aligned_face.2282.11598013005_240c2bc9c7_o.jpg
           ...
     7285955@N06/
           landmark_aligned_face.2049.9486667267_73ac31c862_o.jpg
           landmark_aligned_face.2050.9486613949_909254ccf9_o.jpg
           ...

The label.txt file, which holds the labels of the images, is in the format as follows:

data/30601258@N03/landmark_aligned_face.2.10424815813_e94629b1ec_o.jpg  1
data/30601258@N03/landmark_aligned_face.3.10437979845_5985be4b26_o.jpg  1
data/30601258@N03/landmark_aligned_face.2.11816644924_075c3d8d59_o.jpg  1
data/30601258@N03/landmark_aligned_face.4.10424595844_1009c687e4_o.jpg  0
...

I have tried using this but found out that the directory parameter has to include all images in a folder, rather than images in multiple subfolders.

So, the question is: How can I list the correct directories of the images in the subfolders?

Upvotes: 1

Views: 1102

Answers (2)

Todayisagreatday
Todayisagreatday

Reputation: 43

EDIT: I was calling the wrong function. The .flow_from_directory() is meant for labelled folders. The .flow_from_dataframe() method is suitable in this case.

I imported the .txt file as a dataframe using pandas pd.read_csv()

# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Load dataset as dataframe
df = pd.read_csv("aligned_gender.txt", sep='\t')
# Train test split
train_df, test_df = train_test_split(df, test_size=0.2)
# Output of train_df.head()
datadir label
data/30601258@N03/landmark_aligned_face.2.10424815813_e94629b1ec_o.jpg  1
data/30601258@N03/landmark_aligned_face.3.10437979845_5985be4b26_o.jpg  1
data/30601258@N03/landmark_aligned_face.2.11816644924_075c3d8d59_o.jpg  1
data/30601258@N03/landmark_aligned_face.4.10424595844_1009c687e4_o.jpg  0
...

I was missing one argument where I had to set class_mode='raw'.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Load images using Keras ImageDataGenerator
datagen_train = ImageDataGenerator(rescale=1./255)
train_generator = datagen_train.flow_from_dataframe(
    dataframe=train_df,
    x_col='datadir',
    y_col='label',
    batch_size=128,
    seed=7,
    shuffle=True,
    class_mode='raw',
    target_size=(224,224),
)
# Output 
>>> Found 9755 validated image filenames.

Upvotes: 0

mb0850
mb0850

Reputation: 593

The .flow_from_directory() method of the Keras ImageDataGenerator is useful when you your data is divided into sub-folders based on their labels.

So for example you are trying to classify between cats and dogs. What you could do is keep all the cat images in the cats sub-directory and the dog images in the dogs sub-directory. The .flow_from_directory() method would then take the images from the sub-folders and set their classes accordingly.

From what you are saying, you have the labels mentioned in a text file, then the sub-directories doesn't matter.

What you could do is read the text file which has the filename and label information. Iterate through the filenames and manually load each image into your data. Check out the Pillow library for reading image data.

Upvotes: 1

Related Questions