Reputation: 49
I have a dataset in which image files are given separately and label for that image files given in separate csv file with 1st column as the image file name and 2nd column its respective label.
|Image |label | |123.jpeg|label name|
the 123.jpeg real image is in another folder(train)
How to input the dataset like this and train my machine learning model so that I have another image folder(test) in which image for testing is present and those image name is given in another separate test.csv only with image name
|Image |label| |13.jpg| ? |
for which image I have to predict the label. If anyone can explain this with code structure then it will good to understand since I'm a newbie. thanks
Upvotes: 3
Views: 8472
Reputation: 353
You have to first load the csv file into a dataframe which contains your label.
import pandas as pd
train = pd.read_csv(path_to_train_csv_file)
test = pd.read_csv(path_to_test_csv_file)
This'll load your csv file containing your image_name and the corresponding labels assigned to it. Make sure the label names are string and test dataframe will not have any label column.
Then define the path where your train folder is located.
train_folder = path_to_train_folder
test_folder = path_to_test_folder
Now you can use tensorflow keras api to load your data. First define a Data generator
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Define your data generator
train_gen = ImageDataGenerator(
rotation_range=45,
rescale=1./255,
horizontal_flip=True
)
test_gen = ImageDataGenerator(rescale = 1.255)
Note that test generator only scales and do not augment any whereas in train generator augmentation techniques like horizontal_flip and rotation range is being applied.
After creating Data Generator, we have to get our data
train_data = train_gen.flow_from_dataframe(dataframe = train,
directory = train_folder, x_col = name of your column with image,
y_col = name of column of your labels, seed = 42,
batch_size = size of your batch, shuffle = True,
class_mode="categorical",target_size = (height of image, width of image))
test_data = test_gen.flow_from_dataframe(dataframe = test,
directory = test_folder, x_col = name of your column with image,
y_col = None,
batch_size = size of your batch, shuffle = False,
class_mode=None,target_size = (height of image, width of image))
Note how in test_data, y_col and class_mode is None as it's not defined and is missing and has to be predicted.
You can check if they're loaded properly.
imgs, lbl = next(iter(train_data))
you can visualize your imgs which is your batch of image and similarly lbl is your batch of label.
This is how you'll load your train data and test data for training
Upvotes: 5