Reputation: 1

Build CNN model with keras for uneven training and testing data of folders of images

I have two folders for training and testing dataset of images but both contains different labels like this,

training-
         |-a  -img1.png
               img2.png
         |-as -img1.png
               img2.png
         |-are-img1.png
testing -
         |-as -img1.png
         |-and-img1.png
               img1.png

How can i create ytrain and ytest with this dataset?

i tried the following code,

datagen = ImageDataGenerator(rescale=1. / 255)  
generator = datagen.flow_from_directory(train_data_dir,  
target_size=(img_width, img_height),  
batch_size=batch_size,  
class_mode=None,  
shuffle=False)  

nb_train_samples = len(generator.filenames)  
num_classes = len(generator.class_indices)

Found 316 images belonging to 68 classes.

generator = datagen.flow_from_directory(  
test_data_dir,  
target_size=(img_width, img_height),  
batch_size=batch_size,  
class_mode=None,  
shuffle=False)  
nb_test_samples = len(generator.filenames)

Found 226 images belonging to 48 classes.
Is this the correct way to do labelling?? Because both dataset contains different folder names (a,as,are) and (as, and)

When i build the model, i'm getting 0% accuracy

model = Sequential()  
model.add(Flatten(input_shape=train_data.shape[1:]))  
model.add(Dense(256, activation='relu'))  
model.add(Dropout(0.5))  
model.add(Dense(num_classes, activation='sigmoid'))  

model.compile(optimizer='rmsprop',  
          loss='categorical_crossentropy', metrics=['accuracy'])  

history = model.fit(train_data, 
train_labels,epochs=epochs,batch_size=batch_size,test_data=(test_data, test_labels))  

model.save_weights(top_model_weights_path)  

(eval_loss, eval_accuracy) = model.evaluate(  
 test_data, test_labels, batch_size=batch_size, verbose=1)

Upvotes: 0

Answers (2)

Andrew - OpenGeoCode

Reputation: 2287

Gap is pretty flexible for these types of issues. My favorite way to combine a separated Training and Test dataset is to use Gap's dataset merge feature (+= operator) as follows:

# load the images from the Training directory
images = Images('name_of_dataset', 'training', config=['resize=(224,224)', 'store'])

# load the images from the Testing directory and merge them with the Training data
images += Images('name_of_dataset', 'testing', config=['resize=(224,224)', 'store'])

Upvotes: 1

virtualdvid

Reputation: 2421

I would recommend you to merge both data sets, shuffle them and then split them again to get the train and test data sets with equal labels. That's the correct way of labeling because the model need to "see" all the possible labels and them compare them with the test data set.

For this you can use gapcv:

Install the library:

pip install gapcv

mix folders:

from gapcv.utils.img_tools import ImgUtils
gap = ImgUtils(root_path='root_folder{}/training'.format('_t2'))
gap.transf='2to1'
gap.transform()

This will create a folder with the following structure:

root_folder-
         |-a  -img1.png
               img2.png
         |-as -img1.png
               img2.png
         |-are-img1.png
         |-and-img1.png
               img1.png

Option 1

Use gapcv to pre-process your data set into and shareable h5 file and use to fit images into your keras model:

import os
if not os.path.isfile('name_data_set.h5'):
    # this will create the `h5` file if it doesn't exist
    images = Images('name_data_set', 'root_folder', config=['resize=(224,224)', 'store'])

# this will stream the data from the `h5` file so you don't overload your memory
images = Images(config=['stream'], augment=['flip=both', 'edge', 'zoom=0.3', 'denoise']) # augment if it's needed if not use just Images(config=['stream']), norm 1.0/255.0 by default.
images.load('name_data_set')

#Metadata

print('images train')
print('Time to load data set:', images.elapsed)
print('Number of images in data set:', images.count)
print('classes:', images.classes)

generator:

images.split = 0.2
images.minibatch = 32
gap_generator = images.minibatch
X_test, Y_test = images.test

Fit keras model:

model.fit_generator(generator=gap_generator,
                    validation_data=(X_test, Y_test),
                    epochs=epochs,
                    steps_per_epoch=steps_per_epoch)

why use gapcv? well it's twice faster fitting the model than ImageDataGenerator() :)

Option 2

Use gapcv to shuffle and split the data set with equal labels:

gap = ImgUtils(root_path='root_folder')

# Tree 2
gap.transform(shufle=True, img_split=0.2)

keep using keras ImageDataGenerator() as usual.

Documentation:

Training notebook to mix and split folders.
gapcv documentation.

Let me know how it goes. :)

Upvotes: 2

Build CNN model with keras for uneven training and testing data of folders of images

Answers (2)

Related Questions