Deshwal
Deshwal

Reputation: 4162

Loading images in Keras for CNN from directory but label in CSV file

I have a set of image files in a directory train_images = './data/images' and train_labels = './data/labels.csv'

For example - There are 1000 images in train_images as 377.jpg,17814.jpg .... and so on. And the class they correspond to are saved in a different CSV file.

EDIT- Here are a few rows from the CSV file -

>>
    ID          Class

0   377.jpg     MIDDLE
1   17814.jpg   YOUNG
2   21283.jpg   MIDDLE
3   16496.jpg   YOUNG
4   4487.jpg    MIDDLE

Here I.D is the image file name and the class is the class it is associated to.

I could have used the very usual

ImageDataGenerator().flow_from_directory(train_images, class_mode='binary', batch_size=64)

but the problem is that labels are in a CSV file. What I could do is to rename all the files using os and put different files in different directories and then load it but it looks so immature and foolish.

How can I load data in Keras for CNN where each image is of dimension (h,w,c)?

Upvotes: 3

Views: 10887

Answers (2)

Here's my example using ImageDataGenerator, with the flow_from_dataframe function from ImageDataGenerator, and Pandas to read the CSV. The CSV I was using had two columns:

x_col="Image"
y_col="Id"

So the first column is the filename e.g. xxxx.jpg, and the second column is the class. In this case, since it is from the kaggle humpback whale challenge, what kind of whale it is. The image files are in the directory "../input/humpback-whale-identification/train/"

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, 
Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.preprocessing.image import 
ImageDataGenerator
from keras import regularizers, optimizers
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

So read the CSV using pandas:

traindf=pd.read_csv('../input/humpback-whale- 
identification/train.csv',dtype=str)

Now using ImageDataGenerator

datagen=ImageDataGenerator(rescale=1./255.,validation_split=0.25)
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../input/humpback-whale-identification/train/",
x_col="Image",
y_col="Id",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="categorical",
target_size=(100,100))

Now sometimes the filename/ID in the CSV doesn't have an extension. So in that I used the following to add extensions to them:

def append_ext(fn):
    return fn+".jpg"

traindf["Image"]=traindf["Image"].apply(append_ext)

Well hope that is helpful! It's my first try at answering a Q here :-)

The Kaggle dataset/challenge is here https://www.kaggle.com/c/humpback-whale-identification

Note: I've seen people doing this in all kinds of ways on kaggle! But this seems the easiest!

Upvotes: 9

abhilb
abhilb

Reputation: 5757

Then you can use pandas to read the csv file as a DataFrame using the function read_csv:

import pandas as pd

df = pd.read_csv('csvfilename', delimiter=',')

Then use the flow_from_dataframe function of the ImageDataGenerator class.

There is a tutorial at this link

flow_from_dataframe(dataframe, directory=None, x_col='filename', y_col='class', weight_col=None, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', subset=None, interpolation='nearest', validate_filenames=True)

Upvotes: 3

Related Questions