Reputation: 43
I'm having trouble trying to load the Adience image dataset using Keras's ImageDataGenerator. The image data (.jpg) is in multiple subfolders that are named as user IDs.
directory/
7153718@N04/
landmark_aligned_face.2282.11597961815_4916cbf003_o.jpg
landmark_aligned_face.2282.11598013005_240c2bc9c7_o.jpg
...
7285955@N06/
landmark_aligned_face.2049.9486667267_73ac31c862_o.jpg
landmark_aligned_face.2050.9486613949_909254ccf9_o.jpg
...
The label.txt file, which holds the labels of the images, is in the format as follows:
data/30601258@N03/landmark_aligned_face.2.10424815813_e94629b1ec_o.jpg 1
data/30601258@N03/landmark_aligned_face.3.10437979845_5985be4b26_o.jpg 1
data/30601258@N03/landmark_aligned_face.2.11816644924_075c3d8d59_o.jpg 1
data/30601258@N03/landmark_aligned_face.4.10424595844_1009c687e4_o.jpg 0
...
I have tried using this but found out that the directory parameter has to include all images in a folder, rather than images in multiple subfolders.
So, the question is: How can I list the correct directories of the images in the subfolders?
Upvotes: 1
Views: 1102
Reputation: 43
EDIT: I was calling the wrong function. The .flow_from_directory()
is meant for labelled folders. The .flow_from_dataframe()
method is suitable in this case.
I imported the .txt file as a dataframe using pandas pd.read_csv()
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
# Load dataset as dataframe
df = pd.read_csv("aligned_gender.txt", sep='\t')
# Train test split
train_df, test_df = train_test_split(df, test_size=0.2)
# Output of train_df.head()
datadir label
data/30601258@N03/landmark_aligned_face.2.10424815813_e94629b1ec_o.jpg 1
data/30601258@N03/landmark_aligned_face.3.10437979845_5985be4b26_o.jpg 1
data/30601258@N03/landmark_aligned_face.2.11816644924_075c3d8d59_o.jpg 1
data/30601258@N03/landmark_aligned_face.4.10424595844_1009c687e4_o.jpg 0
...
I was missing one argument where I had to set class_mode='raw'
.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Load images using Keras ImageDataGenerator
datagen_train = ImageDataGenerator(rescale=1./255)
train_generator = datagen_train.flow_from_dataframe(
dataframe=train_df,
x_col='datadir',
y_col='label',
batch_size=128,
seed=7,
shuffle=True,
class_mode='raw',
target_size=(224,224),
)
# Output
>>> Found 9755 validated image filenames.
Upvotes: 0
Reputation: 593
The .flow_from_directory()
method of the Keras ImageDataGenerator
is useful when you your data is divided into sub-folders based on their labels.
So for example you are trying to classify between cats and dogs. What you could do is keep all the cat images in the cats
sub-directory and the dog images in the dogs
sub-directory. The .flow_from_directory()
method would then take the images from the sub-folders and set their classes accordingly.
From what you are saying, you have the labels mentioned in a text file, then the sub-directories doesn't matter.
What you could do is read the text file which has the filename and label information. Iterate through the filenames and manually load each image into your data. Check out the Pillow library for reading image data.
Upvotes: 1