Reputation: 63
I am a beginner for tensorflow. I had just tried to fit a simple LeNet-5 for mnist data.
My training and test data are first in Numpy format. i.e., (60000, 28, 28). Then I set my model as below.
model_LeNet5 = Sequential([
layers.Conv2D(6, kernel_size=3, strides=1, input_shape=(28, 28, 1)),
layers.MaxPooling2D(pool_size=2,strides=2),
layers.ReLU(),
layers.Conv2D(16,kernel_size=3,strides=1),
layers.MaxPooling2D(pool_size=2,strides=2),
layers.ReLU(),
layers.Flatten(),
layers.Dense(120, activation='relu'),
layers.Dense(84, activation='relu'),
layers.Dense(10)
])
I could understand that I get success when I set input_shape as (28,28) or train_images.shape[1:], but I can not understand that input_shape = (28,28,1) is also worked (shown as code above).
It seems that there is an inconsistancy between the shape of data and setting of input size (i.e., [60000,28,28] vs [28,28,1]). Also the broadcast rule may not link [60000,28,28] with [28,28,1]. Thanks for anyone who will explain the mechanism of input_shape.
Upvotes: 2
Views: 264
Reputation: 364
A single grayscale image can be represented using a two-dimensional (2D) NumPy array or a tensor. Since there is only one channel in a grayscale image, we don’t need an extra dimension to represent the color channel. The two dimensions represent the height and width of the image. A batch of 3 grayscale images can be represented using a three-dimensional (3D) NumPy array or a tensor. Here, we need an extra dimension to represent the number of images.
For more information, check out this article on towardsdatascience.
Upvotes: 1