Chanda Steven
Chanda Steven

Reputation: 27

Read images from sub-folders using keras flow_from_dataframe

How can I read the images arranged from subfolders using flow_from_dataframe function not flow_from_directory function in Keras? Here is the dataset directory structure arrangement of the dataset with subfolders and CSV file with labels "classes" and image ids I used in the code with output.`

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pandas as pd

def append_ext(fn):
    return fn+".png"

traindf=pd.read_csv("trainLabels.csv",dtype=str)
print(traindf)

traindf["id"]=traindf["id"].apply(append_ext)
print(traindf)

datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)

train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="./testdf/",
x_col="id",
y_col="label",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
classes = ["animal_1", "animal_2"],
class_mode="categorical",
target_size=(32,32))

valid_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="./testdf/",
x_col="id",
y_col="label",
subset="validation",
batch_size=32,
seed=42,
shuffle=True,
classes = ["animal_1", "animal_2"],
class_mode="categorical",
target_size=(32,32))`. 

Found 0 validated image filenames belonging to 2 classes.
Found 0 validated image filenames belonging to 2 classes.

Thanks!

Upvotes: 1

Views: 838

Answers (1)

Gerry P
Gerry P

Reputation: 8092

If I understand the directory structure it is like

traindf
------ animal_1
       --------frogs_cars_etc
               ----------------- 1.png
               ----------------- 2.png
               ----------------- etc
               ----------------- 10.png
------ animal_2
       -------frogs_cars-etc
               ----------------- 1.png
               ----------------- 2.png
               ----------------- etc
               ----------------- 10.png

Now it looks to me like there are only 2 classes and 20 total image files in the dataset. The csv file you referenced therefore seems to have NO correlation to the actual data. You can create your own dataframe with the code below but with so few samples I doubt it is going to train at all.

data_dir=r'.\traindf'  # main directory
filepaths=[] # store list of filepaths to the images
labels = []  # store list of labels for each image file
classlist= os.listdir(data_dir)  # should yield [animal_1, animal_2] these are the classes
for klass in classlist:
    classpath=os.path.join(data_dir, klass, 'frogs_cars_etc') #path to get to file list
    file_list=os.listdir(classpath) # list of files
    for f in file_list: # iterate through the list of files
        fpath=os.path.join(classpath, f) # full path to the file
        filepaths.append(fpath) # save the filepath
        labels.append(klass)    # save the label
Fseries=pd.Series(filepaths, name='filepaths')
Lseries=pd.Series(labels, name='labels')
df=pd.concat([Fseries, Lseries], axis=1) # dataframe of form filepaths  labels
print (df.head())

You can use the dataframe in flow_from_dataframe but again there are only 20 images so not very useful.

Upvotes: 1

Related Questions