Reputation: 544
I am prototyping a deep learning segmentation model that needs six channels of input (two aligned 448x448 RGB images under different lighting conditions). I wish to compare the performance of several pretrained models to that of my current model, which I trained from scratch. Can I use the pretrained models in tf.keras.applications
for input images with more than 3 channels?
I tried applying a convolution first to reduce the channel dimension to 3 and then passed that output to tf.keras.applications.DenseNet121()
but received the following error:
import tensorflow as tf
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
dense_stem = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', input_tensor=dense_filter)
*** ValueError: You are trying to load a weight file containing 241 layers into a model with 242 layers.
Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?
Upvotes: 3
Views: 5965
Reputation: 549
As a complementary approach to adding a convolutional layer before a pre-trained architecture, e.g. any of the pre-trained models available in tf.keras.applications
that were trained with RGB-inputs, you could consider manipulating the existing weights so that they would match with your model with 6-channel inputs. For example, if your architecture remains the same besides the added input modalities, you can repeat the green channel to the newly added 3 input channels: see here.
"Is there a better way to use pretrained models on data with a different number of input channels in keras? Will pretraining even help when the number of input channels is different?"
Both the aforementioned and commonly used techniques
enable transfer learning, which is virtually always a better choice than starting the training from scratch. However, do not expect neither of the options to work without some retraining. In my opinion/experience, the latter is better. The reason is that the randomly initialized Conv-layers in the former approach would (at least initially) result in radically different inputs than what the rest of the architecture has "got used to seeing". This was already reasoned in the earlier answer by @Kris. The latter technique takes advantage of the fact that many of the relevant features are fairly similar in the different input modalities: a dog might still look like a dog even in a newly added input modality (e.g. RGB vs thermal light).
Upvotes: 2
Reputation: 482
Cross Modality Pre-training may be the method you need. Proposed by Wang et al. (2016), this method averages the weights of the pre-trained model across the channels in the first layer and replicates the mean by the number of target channels. The experiment result indicates that the network gets better performance by using this kind of pre-training method even it has 20 input channels and its input modality is not RGB.
To apply this, one can refer to another answer that use layer.get_weights() and layer.set_weights() to manually set the weights in the first layer of the pre-trained model.
Upvotes: 3
Reputation: 23559
Technically, it should be possible. Perhaps using the model's __call__
itself:
orig_model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet')
dense_input = tf.keras.layers.Input(shape=(448, 448, 6))
dense_filter = tf.keras.layers.Conv2D(3, 3, padding='same')(dense_input)
output = orig_model(dense_filter)
model = tf.keras.Model(dense_input, output)
model.compile(...)
model.summary()
On a conceptual level, though, I'd be worried that the new input doesn't look much like the original input that the pretrained model was trained on.
Upvotes: 8