Reputation:
I'm trying to divide my images (dataset of bunnies and dogs) into x_train
, x_val
, y_train
, y_val
, and testing.
The following is what I did:
I placed the photos of each class (dogs/bunnies) in separate folders inside two folders: training and testing.
Training directory-> Bunny directory -> bunny images
Training directory-> Puppy directory -> puppy images
Testing directory-> Bunny directory -> bunny images
Testing directory-> Puppy directory -> puppy images
I used the following code to get the images from the folders:
training_data = train_datagen.flow_from_directory('./images/train',
target_size = (28, 28),
batch_size = 86,
class_mode = 'binary',
color_mode='rgb',
classes=None)
test_data = test_datagen.flow_from_directory('./images/test',
target_size = (28, 28),
batch_size = 86,
class_mode = 'binary',
color_mode='rgb',
classes=None)
Which gives me the following output:
Found 152 images belonging to 2 classes.
Found 23 images belonging to 2 classes.
Question 1: I wasn't sure how to define my labels here (y_val
/ y_train
) or if I need to (but it appears that most models have y_val
/y_train
).
Question 2: I tried to run
x_train, x_val = train_test_split(training_data, test_size=0.1)
In order to at least split my training data into validation/training, but when I tried to run my model it gave me the following error:
history=classifier.fit_generator(x_train,
steps_per_epoch = (8000 / 86),
epochs = 2,
validation_data = x_val,
validation_steps = 8000/86,
callbacks=[learning_rate_reduction])
ValueError: validation_data
should be a tuple (val_x, val_y, val_sample_weight)
or (val_x, val_y)
.
Found: [(array([[[[0.5058095 , 0.46913707, 0.42369673],...
Upvotes: 0
Views: 5223
Reputation: 545
Question 1:
From my experience there's no discernable confinements in naming y,x variables. For example in this kernel a person uses y_train, y_test
names for labels and here a person uses train_Y
. There's a rule that you should give names that shows what the variable is about.
Question 2:
I would recommend using validation_split
parameter in ImageDataGenerator
(doc) to set up fraction of images reserved for validation. After that I would recommend using subset
parameter in flow_from_directory
(doc) to define training_generator
and validation generator
variables. (I want to point out that the flow_from_directory returns generator, not data).
So your code would look like:
data_generator = ImageDataGenerator(
validation_split=0.2,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
)
train_generator = data_generator.flow_from_directory(
'./images/train',
target_size = (28, 28),
batch_size = 86,
class_mode = 'binary',
color_mode='rgb',
classes=None, subset="training"
)
validation_generator = data_generator.flow_from_directory(
'./images/train',
target_size = (28, 28),
batch_size = 86,
class_mode = 'binary',
color_mode='rgb',
classes=None, subset="validation"
)
history=classifier.fit_generator(
train_generator,
steps_per_epoch = (8000 / 86),
epochs = 2,
validation_data = validation_generator,
validation_steps = 8000/86,
callbacks=[learning_rate_reduction]
)
Upvotes: 2