Kate
Kate

Reputation: 49

Cannot get reproducible results with ImageDataGenerator in keras

I am trying to get reproducible results between multiple runs of the same script in keras, but I get different ones at each iteration. My code looks like this:

import numpy as np
from numpy.random import seed
import random as rn
import os

seed_num = 1
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
os.environ['PYTHONHASHSEED'] = '1'
os.environ['TF_DETERMINISTIC_OPS'] = '1'
np.random.seed(seed_num)
rn.seed(seed_num)

import tensorflow as tf
tf.random.set_seed(seed_num)

import tensorflow.keras as ks
from tensorflow.python.keras import backend as K

...some imports...
from tensorflow.keras.preprocessing.image import ImageDataGenerator


.... data loading etc ....

generator = ImageDataGenerator(
                width_shift_range=0.1,
                height_shift_range=0.1,
                horizontal_flip=True)
                      
generator.fit(X_train, seed=seed_num)                
my_model.fit(generator.flow(X_train, y_train, batch_size=batch_size, shuffle=False, seed=seed_num), validation_data=(X_val, y_val), callbacks=callbacks , epochs=epochs, shuffle=False)

I identified the problem to be in ImageDataGenerator, i.e., when setting generator = ImageDataGenerator() without any augmentation the results are reproducible. I am also running on CPU and TensorFlow version is 2.4.1. What am I missing here?

Upvotes: 1

Views: 939

Answers (1)

Andrea Maranesi
Andrea Maranesi

Reputation: 121

Using GPU while creating augmented images can produce nondeterministic results.

To get reproducible results using ImageDataGenerator and GPU, one way is the following:

import random, os
import numpy as np
import tensorflow as tf

def set_seed(seed=0):
  np.random.seed(seed) 
  tf.random.set_seed(seed) 
  random.seed(seed)
  os.environ['TF_DETERMINISTIC_OPS'] = "1"
  os.environ['TF_CUDNN_DETERMINISM'] = "1"
  os.environ['PYTHONHASHSEED'] = str(seed)

set_seed()

Before model.fit() call again set_seed():

    set_seed()
    model.fit(...) 

Otherwise, you can install the package tensorflow-determinism:

pip install tensorflow-determinism

If you're using Google Colab, restart your runtime or it won't probably work

The package will interact with GPU to produce deterministic results.

import random, os
import numpy as np
import tensorflow as tf

def set_seed(seed=0):
  os.environ['TF_DETERMINISTIC_OPS'] = '1'
  random.seed(seed)
  np.random.seed(seed)
  tf.random.set_seed(seed)

set_seed()

# code

Also in this case, before model.fit() call again set_seed():

    set_seed()
    model.fit(...) 

Upvotes: 1

Related Questions