Reputation: 213
Please show or explain a dummy example code snippet demonstrating K-Fold Cross Validation with Flow_from_Dataframe, Training_Generator, and Valid_Generator objects for Keras. This is the current code I have (no k-fold only simple fitting ):
ImageDataGen object to perform all the augmentations
IMG_SIZE = (150, 150)
core_idg = ImageDataGenerator(samplewise_center=True,
samplewise_std_normalization=True,
horizontal_flip = True,
vertical_flip = False,
height_shift_range= 0.05,
width_shift_range=0.1,
rotation_range=5,
shear_range = 0.1,
fill_mode = 'reflect',
zoom_range=0.15)
Split Main Dataframe to train_dataframe and valid_dataframe
train_df, valid_df = train_test_split(main_DF,
test_size = 0.10,
random_state = 2018,
stratify = df_large['BINARY'].map(lambda x: x))
creating train_gen
and valid_gen
using flow_from_dataframe method of ImageDatagen object created before.
"IMAGE_NAMES" and "BINARY" are the columns which consists of Image names and label 0 or 1.
all_labels = [ "0" , "1" ]
train_gen = core_idg.flow_from_dataframe(dataframe=train_df,
directory="./DataFolder/",
x_col = 'IMAGE_NAMES',
y_col = 'BINARY',
class_mode = 'categorical',
classes = all_labels,
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 64)
valid_gen = core_idg.flow_from_dataframe(dataframe=valid_df,
directory="./DataFolder/",
x_col = 'IMAGE_NAMES',
y_col = 'BINARY',
class_mode = 'categorical',
classes = all_labels,
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 256)
test_X, test_Y = next(core_idg.flow_from_dataframe(dataframe=valid_df,
directory="./DataFolder/",
x_col = 'IMAGE_NAMES',
y_col = 'BIN_STR',
class_mode = 'categorical',
classes = all_labels,
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 256))
#fitting
hist = model.fit_generator(train_gen,
validation_data = (test_X, test_Y),
epochs = 30,
callbacks = call_list)
Now how to translate this to K-Fold Cross-validation?
according to me core_idg
has to be created once outside the K-Fold loop and instead of train_df and valid_df we should use the K-Fold method of index to split.
So how can the code snippet I mentioned Can be transformed?
Upvotes: 4
Views: 4707
Reputation: 213
Something like this worked for me, creating dataframes inside K-fold loop
IMG_SIZE = (150, 150)
core_idg = ImageDataGenerator(samplewise_center=True,
samplewise_std_normalization=True,
horizontal_flip = True,
vertical_flip = False,
height_shift_range= 0.05,
width_shift_range=0.1,
rotation_range=5,
shear_range = 0.1,
fill_mode = 'reflect',
zoom_range=0.15)
# Training with K-fold cross validation
kf = KFold(n_splits=k_folds, random_state=None, shuffle=True)
X= np.array(df_large["IMAGE_NAMES"])
i = 1
for train_index, test_index in kf.split(X):
trainData = X[train_index]
testData = X[test_index]
## create train, valid dataframe and thus train_gen , valid_gen for each fold-loop
train_df = df_large.loc[df_large["IMAGE_NAMES"].isin(list(trainData))]
valid_df = df_large.loc[df_large["IMAGE_NAMES"].isin(list(testData))]
#create model object
model= build_model()
all_labels = [ "0" , "1" ]
train_gen = core_idg.flow_from_dataframe(dataframe=train_df,
directory="./DataFolder/",
x_col = 'IMAGE_NAMES',
y_col = 'BINARY',
class_mode = 'categorical',
classes = all_labels,
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 64)
valid_gen = core_idg.flow_from_dataframe(dataframe=valid_df,
directory="./DataFolder/",
x_col = 'IMAGE_NAMES',
y_col = 'BINARY',
class_mode = 'categorical',
classes = all_labels,
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 256)
hist = img_classify.fit_generator(
train_gen,
steps_per_epoch= len(trainData),
epochs= n_epochs,
validation_data=valid_gen,
callbacks = callback_list
)
If any suggestions to make this better, please comment.
Upvotes: 3