gel
gel

Reputation: 73

How to training/testing without make all of the data float32 in Keras?

I'm trying to do image recognition, so I looked at the CIFAR10 example of Keras.

Before fitting the model to the data, the data (X_train/X_test) needs to be normalize to 0-1 and converted to float32. That's OK when I am using a small data like a CIFAR10. But when the data size increases, it would consume a large amount of memory to convert the data to float32. I do not want to convert all the data to float32.

Can this work (convert data to float32 and normalize) for each mini-batch in keras?

Upvotes: 4

Views: 4911

Answers (1)

nemo
nemo

Reputation: 57609

You can do the conversion once and store the normalized, converted data to a file which you load for training, this way you don't need to convert it every time.

For example (normalize.py / python 3):

from keras.datasets import cifar10
import pickle

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

with open('cifar10_normalized.pkl', 'wb') as f:
    pickle.dump(((X_train, y_train), (X_test, y_test)), f)

In your code (e.g. train.py) you can then do

import pickle

with open('cifar10_normalized.pkl', 'rb') as f:
    (X_train, y_train), (X_test, y_test) = pickle.load(f)

Another possibility is to do the normalization and conversion for each batch. Use model.train_on_batch to run a single batch. For example:

for (x_train,y_train) in yourData:
    x_train = x_train.astype(np.float32) / 255
    model.train_on_batch(x_train, y_train)

Finally you can also use a python generator for training:

def g():
    for (x_train,y_train) in yourData:
        x_train = x_train.astype(np.float32) / 255
        yield (x_train, y_train)
model.fit_generator(g)

Upvotes: 4

Related Questions