Reputation: 73
I'm trying to do image recognition, so I looked at the CIFAR10 example of Keras.
Before fitting the model to the data, the data (X_train
/X_test
) needs to be normalize to 0-1 and converted to float32
. That's OK when I am using a small data like a CIFAR10. But when the data size increases, it would consume a large amount of memory to convert the data to float32
. I do not want to convert all the data to float32
.
Can this work (convert data to float32
and normalize) for each mini-batch in keras?
Upvotes: 4
Views: 4911
Reputation: 57609
You can do the conversion once and store the normalized, converted data to a file which you load for training, this way you don't need to convert it every time.
For example (normalize.py
/ python 3):
from keras.datasets import cifar10
import pickle
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
with open('cifar10_normalized.pkl', 'wb') as f:
pickle.dump(((X_train, y_train), (X_test, y_test)), f)
In your code (e.g. train.py
) you can then do
import pickle
with open('cifar10_normalized.pkl', 'rb') as f:
(X_train, y_train), (X_test, y_test) = pickle.load(f)
Another possibility is to do the normalization and conversion for each batch. Use model.train_on_batch
to run a single batch. For example:
for (x_train,y_train) in yourData:
x_train = x_train.astype(np.float32) / 255
model.train_on_batch(x_train, y_train)
Finally you can also use a python generator for training:
def g():
for (x_train,y_train) in yourData:
x_train = x_train.astype(np.float32) / 255
yield (x_train, y_train)
model.fit_generator(g)
Upvotes: 4