Immanuel
Immanuel

Reputation: 49

How to train a model with a dataset in which image dataset is given and label for that images given in separate csv file?

I have a dataset in which image files are given separately and label for that image files given in separate csv file with 1st column as the image file name and 2nd column its respective label.

|Image |label | |123.jpeg|label name|

the 123.jpeg real image is in another folder(train)

How to input the dataset like this and train my machine learning model so that I have another image folder(test) in which image for testing is present and those image name is given in another separate test.csv only with image name

|Image |label| |13.jpg| ? |

for which image I have to predict the label. If anyone can explain this with code structure then it will good to understand since I'm a newbie. thanks

Upvotes: 3

Views: 8472

Answers (1)

Aniket Thomas
Aniket Thomas

Reputation: 353

You have to first load the csv file into a dataframe which contains your label.

import pandas as pd    
train = pd.read_csv(path_to_train_csv_file)
test = pd.read_csv(path_to_test_csv_file)

This'll load your csv file containing your image_name and the corresponding labels assigned to it. Make sure the label names are string and test dataframe will not have any label column.

Then define the path where your train folder is located.

train_folder = path_to_train_folder
test_folder = path_to_test_folder

Now you can use tensorflow keras api to load your data. First define a Data generator

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define your data generator
train_gen = ImageDataGenerator(
rotation_range=45,
rescale=1./255,
horizontal_flip=True
)
test_gen = ImageDataGenerator(rescale = 1.255)

Note that test generator only scales and do not augment any whereas in train generator augmentation techniques like horizontal_flip and rotation range is being applied.

After creating Data Generator, we have to get our data

train_data = train_gen.flow_from_dataframe(dataframe = train, 
directory = train_folder, x_col = name of your column with image, 
y_col = name of column of your labels, seed = 42,
batch_size = size of your batch, shuffle = True, 
class_mode="categorical",target_size = (height of image, width of image))

test_data = test_gen.flow_from_dataframe(dataframe = test, 
directory = test_folder, x_col = name of your column with image, 
y_col = None,
batch_size = size of your batch, shuffle = False, 
class_mode=None,target_size = (height of image, width of image))

Note how in test_data, y_col and class_mode is None as it's not defined and is missing and has to be predicted.

You can check if they're loaded properly.

imgs, lbl = next(iter(train_data))

you can visualize your imgs which is your batch of image and similarly lbl is your batch of label.

This is how you'll load your train data and test data for training

Upvotes: 5

Related Questions