Reputation: 1
I have two folders for training and testing dataset of images but both contains different labels like this,
training-
|-a -img1.png
img2.png
|-as -img1.png
img2.png
|-are-img1.png
testing -
|-as -img1.png
|-and-img1.png
img1.png
How can i create ytrain and ytest with this dataset?
i tried the following code,
datagen = ImageDataGenerator(rescale=1. / 255)
generator = datagen.flow_from_directory(train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_train_samples = len(generator.filenames)
num_classes = len(generator.class_indices)
Found 316 images belonging to 68 classes.
generator = datagen.flow_from_directory(
test_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_test_samples = len(generator.filenames)
Found 226 images belonging to 48 classes.
Is this the correct way to do labelling??
Because both dataset contains different folder names (a,as,are) and (as, and)
When i build the model, i'm getting 0% accuracy
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_data,
train_labels,epochs=epochs,batch_size=batch_size,test_data=(test_data, test_labels))
model.save_weights(top_model_weights_path)
(eval_loss, eval_accuracy) = model.evaluate(
test_data, test_labels, batch_size=batch_size, verbose=1)
Upvotes: 0
Views: 713
Reputation: 2287
Gap is pretty flexible for these types of issues. My favorite way to combine a separated Training and Test dataset is to use Gap's dataset merge feature (+= operator) as follows:
# load the images from the Training directory
images = Images('name_of_dataset', 'training', config=['resize=(224,224)', 'store'])
# load the images from the Testing directory and merge them with the Training data
images += Images('name_of_dataset', 'testing', config=['resize=(224,224)', 'store'])
Upvotes: 1
Reputation: 2421
I would recommend you to merge both data sets, shuffle them and then split them again to get the train and test data sets with equal labels. That's the correct way of labeling because the model need to "see" all the possible labels and them compare them with the test data set.
For this you can use gapcv
:
Install the library:
pip install gapcv
mix folders:
from gapcv.utils.img_tools import ImgUtils
gap = ImgUtils(root_path='root_folder{}/training'.format('_t2'))
gap.transf='2to1'
gap.transform()
This will create a folder with the following structure:
root_folder-
|-a -img1.png
img2.png
|-as -img1.png
img2.png
|-are-img1.png
|-and-img1.png
img1.png
Option 1
Use gapcv
to pre-process your data set into and shareable h5
file and use to fit images into your keras
model:
import os
if not os.path.isfile('name_data_set.h5'):
# this will create the `h5` file if it doesn't exist
images = Images('name_data_set', 'root_folder', config=['resize=(224,224)', 'store'])
# this will stream the data from the `h5` file so you don't overload your memory
images = Images(config=['stream'], augment=['flip=both', 'edge', 'zoom=0.3', 'denoise']) # augment if it's needed if not use just Images(config=['stream']), norm 1.0/255.0 by default.
images.load('name_data_set')
#Metadata
print('images train')
print('Time to load data set:', images.elapsed)
print('Number of images in data set:', images.count)
print('classes:', images.classes)
generator:
images.split = 0.2
images.minibatch = 32
gap_generator = images.minibatch
X_test, Y_test = images.test
Fit keras
model:
model.fit_generator(generator=gap_generator,
validation_data=(X_test, Y_test),
epochs=epochs,
steps_per_epoch=steps_per_epoch)
why use gapcv? well it's twice faster fitting the model than ImageDataGenerator()
:)
Option 2
Use gapcv
to shuffle and split the data set with equal labels:
gap = ImgUtils(root_path='root_folder')
# Tree 2
gap.transform(shufle=True, img_split=0.2)
keep using keras
ImageDataGenerator()
as usual.
Documentation:
Training notebook to mix and split folders.
gapcv documentation.
Let me know how it goes. :)
Upvotes: 2