user10908656
user10908656

Reputation:

CNN-Divide images into training/validation/testing

I'm trying to divide my images (dataset of bunnies and dogs) into x_train, x_val, y_train, y_val, and testing.

The following is what I did:

I placed the photos of each class (dogs/bunnies) in separate folders inside two folders: training and testing.

Training directory-> Bunny directory -> bunny images

Training directory-> Puppy directory -> puppy images

Testing directory-> Bunny directory -> bunny images

Testing directory-> Puppy directory -> puppy images

I used the following code to get the images from the folders:

training_data = train_datagen.flow_from_directory('./images/train',
                                             target_size = (28, 28),
                                             batch_size = 86,
                                             class_mode = 'binary',
                                             color_mode='rgb',
                                             classes=None)


test_data = test_datagen.flow_from_directory('./images/test',
                                        target_size = (28, 28),
                                        batch_size = 86,
                                        class_mode = 'binary',
                                        color_mode='rgb',
                                        classes=None)

Which gives me the following output:

Found 152 images belonging to 2 classes.

Found 23 images belonging to 2 classes.

Question 1: I wasn't sure how to define my labels here (y_val/ y_train) or if I need to (but it appears that most models have y_val/y_train).

Question 2: I tried to run

x_train, x_val = train_test_split(training_data, test_size=0.1)

In order to at least split my training data into validation/training, but when I tried to run my model it gave me the following error:

history=classifier.fit_generator(x_train,
                     steps_per_epoch = (8000 / 86),
                     epochs = 2,
                     validation_data = x_val,
                     validation_steps = 8000/86,
                     callbacks=[learning_rate_reduction])

ValueError: validation_data should be a tuple (val_x, val_y, val_sample_weight) or (val_x, val_y).

Found: [(array([[[[0.5058095 , 0.46913707, 0.42369673],...

Upvotes: 0

Views: 5223

Answers (1)

Arkady. A
Arkady. A

Reputation: 545

Question 1:

From my experience there's no discernable confinements in naming y,x variables. For example in this kernel a person uses y_train, y_test names for labels and here a person uses train_Y. There's a rule that you should give names that shows what the variable is about.

Question 2:

I would recommend using validation_split parameter in ImageDataGenerator (doc) to set up fraction of images reserved for validation. After that I would recommend using subset parameter in flow_from_directory (doc) to define training_generator and validation generator variables. (I want to point out that the flow_from_directory returns generator, not data).

So your code would look like:

data_generator = ImageDataGenerator(
    validation_split=0.2,
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
)

train_generator = data_generator.flow_from_directory(
    './images/train',
    target_size = (28, 28),
    batch_size = 86,
    class_mode = 'binary',
    color_mode='rgb',
    classes=None, subset="training"
)

validation_generator = data_generator.flow_from_directory(
    './images/train',
    target_size = (28, 28),
    batch_size = 86,
    class_mode = 'binary',
    color_mode='rgb',
    classes=None, subset="validation"
)

history=classifier.fit_generator(
    train_generator,
    steps_per_epoch = (8000 / 86),
    epochs = 2,
    validation_data = validation_generator,
    validation_steps = 8000/86,
    callbacks=[learning_rate_reduction]
)

Upvotes: 2

Related Questions