Reputation: 7374
Im trying to build a simple image classifier using Keras with Tensorflow as backend. However im having a very hard time understanding how nomalization is done in Keras.
It is my understanding that in Machine Learning you calculate the mean and std of the training + validation set and then reuse the mean and std when normalizing the test set and when doing prediction on new data. So whit this in mind I will explain what Im not understanding in each part of Keras.
train_datagen = ImageDataGenerator(rescale=1./255, samplewise_center=True, samplewise_std_normalization=True, shear_range=0.2, zoom_range=0.2)
test_datagen = ImageDataGenerator(rescale=1./255, samplewise_center=True, samplewise_std_normalization=True)
batch_size = 1024
train_generator = train_datagen.flow(X_train, one_hot_train_labels, batch_size=batch_size, shuffle=True)
validation_generator = test_datagen.flow(X_valid, one_hot_valid_labels, batch_size=batch_size)
First questions are with regards to ImageDataGenerator
.
In the documentation it says that the flow
function normalizes the data, then I have tree questions regarding this:
samplewise_std_normalization
and samplewise_center
if it is the flow function that does the normalization?How can Keras do normalization on augmentad data that is generated at runtime so the mean and std is not know before start?
result = model.evaluate(X_test, one_hot_test_labels)
When we run evaluate
I have one question:
How is the normalization handled here? I dont have access to the mean and std so I cant apply them to also the testing set?
predict_softmaxs = model.predict(np.array(resized_images))
When I run predict
I have one question:
Upvotes: 2
Views: 1258
Reputation: 2378
- What is the effect of samplewise_std_normalization and samplewise_center if it is the flow function that does the normalization?
It's common practice to define transformers before you run them. For example scikit-learn
's transformers also do this (actually their StandardScaler
works in an analogous way)
- How can Keras do normalization on augmentad data that is generated at runtime so the mean and std is not know before start?
It can do batchtwise normalization.
It can be argued though that this is not correct way to standardize data. The correct way would be to standardize according to train set.
That means that you should first fit
generator on train data, and only then use flow
method.
This is actually baked in ImageDataGenerator
- if you specify samplewise_std_normalization
and do not fit your generator Keras will warn you when trying to standardize batch.
These are model's methods. They just run model on input data, so they don't do any normalization - you should feed data that was normalized before to them.
Upvotes: 3