Reputation: 61
I am using keras to build a model that inputs 720x1280 images and outputs a value.
I am having a problem with keras.models.Sequential.predict_generator
when using the keras.utils.Sequence
class to obtain the values corresponding to images on the validation/training sets. The values returned are shuffled, so I don't know which output corresponds to which image.
This is how my generators are defined
from skimage.io import ImageCollection, imread
from keras.utils import Sequence
def load_images(f):
return imread(f).astype(np.float64)
class DataSetImageKeras(Sequence):
def __init__(self, image_collection, values, batch_size):
self.images = image_collection
self.hf = values
self.batch_size = batch_size
self.n = len(self.images)
self.x_scale = 250
self.y_scale = 1e4
def __len__(self):
return int(np.ceil(len(self.images) / float(self.batch_size)))
def __getitem__(self, idx):
# batch_x is a numpy.ndarray
batch_x = (
self.images[idx:min(idx + self.batch_size, self.n)]
.concatenate()
.reshape(self.batch_size, 720, 1280, 1)
)
batch_y = self.hf[idx:min(idx + self.batch_size, self.n)]
return batch_x/self.x_scale, batch_y/self.y_scale
images_train = ImageCollection(images_paths_train, load_func=load_images)
images_val = ImageCollection(images_paths_test, load_func=load_images)
data_train = DataSetImageKeras(images_train, values_train, n_batch)
data_val = DataSetImageKeras(images_val, values_val, n_batch)
from keras.models import load_model
model = load_model('model001') #this model is already trained
If I use the following code:
val_result = []
val_hf =[]
for (batch_x, batch_y) in data_val:
val_result.append(model.predict_on_batch(batch_x))
val_hf.append(batch_y)
val_result = np.concatenate(val_result)
val_hf = np.concatenate(val_hf)
plt.plot(val_hf,
val_result,
marker='.',
linestyle='')
The correct result is obtained (as seen on this image where x is the desired value and y is the predicted value)
However if I use the predict_generator function, as below:
val_result = model.predict_generator(data_val, verbose=1,
workers=1,
max_queue_size=50,
use_multiprocessing=False)
The output is shuffled as can be seen here.
My problem is similar to
#5048 and
#6745,
which should be solved by
#6891 API, but I am using keras version 2.1.6 and it is still shuffling my predictions, even when using workers=1
.
It is also similar to this, but I didn't find anything that could reset the generators and this problem is still present if I define a new generator and try to run the predict_generator
.
I also found something stating that it could have something to do with the number of batches not dividing exactly the number of samples, but this problem is still present if I use n_batch=1
As a side note, it might be that predict_generator is not shuffling data, but only returning it with an index offset, since the input data on values
and images_paths
are already shuffled.
Upvotes: 4
Views: 1865
Reputation: 61
predict_generator
was not shuffling my predictions, after all. The problem was with the __getitem__
method. For instance, usingn_batch=32
, the method would yield values from 1 to 32, then from 2 to 33 and so forth, instead of from 1 to 32, 33 to 64, etc.
Changing the method as follows solves the problem
def __getitem__(self, idx):
# batch_x is a numpy.ndarray
idx_min = idx*self.batch_size
idx_max = min(idx_min + self.batch_size, self.n)
batch_x = (
self.images[idx_min:idx_max]
.concatenate()
.reshape(self.batch_size, 720, 1280, 1)
)
batch_y = self.hf[idx_min:idx_max]
Upvotes: 1