Reputation: 453
I have images with shape (100, 100, 3), and I want to use keras 1D convolution to classify the images. I want to know if this is possible, and what is the shape of the input I need to use.
PS: I use tf.data.Dataset
, and my dataset is batched (20, 100, 100, 3).
Upvotes: 2
Views: 3995
Reputation: 7591
TLDR; Not by itself, but maybe if composed.
The correlation between pixels in an image (be it 2D or 3D due to multiple channels) is of spatial nature: the value of a given pixel is highly influenced by the neighboring pixels both vertically and horizontally. The advantage of 2D/3D Convolution (Conv2D
or Conv3D
) is that they manage to capture this influence in both spatial directions: vertical and horizontal.
In comparison, 1D convolution or Conv1D
is only capturing one of the two correlations (either vertical or horizontal), thus yielding much more limited information. By itself, a singe Conv1D
will be leaving out substantial information.
Nonetheless, since a Conv2D could be 'decomposed' into two Conv1D
blocks (this is similar to the Pointwise & Depthwise convolutions in the MobileNet architecture), concatenating a Vertical Conv1D
and a Horizontal Conv1D
captures the spatial correlation in both axes. This is valid approach towards image classification as an alternative to Conv2D
.
Yes, we can.
You should not reshape the data to reduce dimensions: if you do, you would be taping together one end of the image (say the top if the Conv1D is applied vertically) with the other end (the say the bottom), which breaks spatial coherence.
This is a possible example on how (implementing the concatenation explained above):
import tensorflow as tf
x = tf.random.normal(input_shape = (20, 100, 100, 3)) # your input batch
# Horizontal Conv1D
y_h = tf.keras.layers.Conv1D(
filters=32, kernel_size=3, activation='relu', input_shape=x.shape[2:])(x)
# Vertical Conv1D
y_v = tf.transpose(x, perm=[0, 2, 1, 3]) # Image rows to columns
y_v = tf.keras.layers.Conv1D(
filters=32, kernel_size=3, activation='relu', input_shape=x.shape[2:])(x)
# y_v = tf.transpose(y_v, perm=[0, 2, 1, 3]) # Undo transpose, optional
# Concatenate results
y = tf.keras.layers.Concatenate(axis=3)([y_h, y_v]) # Concatenate on the feature_maps
Note that you require multiple operations to obtain a result (convolution over vertical and horizontal axes) which would be easier and faster to get by applying Conv2D
directly.
If your image data is particularly uninformative in one axis while being particularly interesting in the other spatial axis, it might be an idea worth exploring. Otherwise it is better to resort to standard Conv2D
(Most of the cases out there, including almost all public image datasets).
Upvotes: 2
Reputation: 4903
I assume you mean 1x1 convolutions which convolve images across layers. In your case the layer code would be:
tf.keras.layers.Conv2D(filters=NUM_FILTERS, kernel_size=1, strides=1)
Conv1D is indeed for 1d data processing (like sound) as @MatusDubrava pointed out.
Upvotes: 1