Reputation: 7339
I would like a canonical answer on the best way to convert input rgb images to grayscale in Keras. This answer hints that perhaps such a thing would be best achieved with a Lambda, but that feels inefficient to me. It seems to me that Average Pooling layers should be able to do the trick, but I can't seem to figure that out. Is there an RGB to Grayscale layer that I am just missing in the docs? It seems like that is a pretty commonplace operation.
Upvotes: 12
Views: 24738
Reputation: 168
There's a much easier way in Keras>=2.1.6 to convert between RGB and grayscale. When you are augmenting your image data using the ImageDataGenerator Class, you can use the flow_from_directory method to create a generator object which can be used to train your model using the fit_generator method.
What is great about the flow_from_directory method is that it has several parameters to do more image processing, one of which is color_mode which can be set to 'rgb' or 'grayscale'. I'm not sure why this parameter is included in the generator object and not in the ImageDataGenerator object parameters but it does the trick.
If you are willing to take a small effort to setup a generator (docs: https://keras.io/preprocessing/image/#imagedatagenerator-methods) this and several other useful pre-processing parameters become available.
Upvotes: 6
Reputation: 86600
There are a few formulas to transform a color image into a grayscale image. They're very well determined, and the choice often depends on whether you'd like brighter or darker results, better contrast, etc.
Three common formulas are here. Let's take the "luminosity" formula.
result = 0.21 R + 0.72 G + 0.07 B
This can only be achieved by a lambda layer. And it's not inneficient, it's just necessary math.
def converter(x):
#x has shape (batch, width, height, channels)
return (0.21 * x[:,:,:,:1]) + (0.72 * x[:,:,:,1:2]) + (0.07 * x[:,:,:,-1:])
Add this lambda layer to the model:
Lambda(converter)
Although the AveragePooling seems to be the way, these layers are meant to reduce the "spatial" dimensions, not the "channels". You'd need a lot of workaround and reshaping to make one of these pooling layers apply to channels.
If you prefer to use a ready formula from tensorflow, again, use a lambda layer, now with this function, based on the answer you provided:
Lambda(lambda x: tf.image.rgb_to_grayscale(x))
Other options for converter
:
#perhaps faster? perhaps slower?
def converter(x):
weights = K.constant([[[[0.21 , 0.72 , 0.07]]]])
return K.sum(x*weights, axis=-1,keepdims=True)
As Stepan Novikov commented. If your idea is simply to preprocess images, you can use other tools and avoid the trouble.
You only need to do this inside the model if it's important to you to keep track of the gradients in this operation.
Upvotes: 17