Reputation: 23
I made the MNIST images which are 28x28 pixel images into tensors with
dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor())
and when I run
img_tensor, label = dataset[0]
print(img_tensor.shape, label)
It says the shape is torch.Size([1, 28, 28])
.
Why is it a 1x28x28? What does the first dimension mean? and what is the point of a 1x28x28 opposed to 28x28?
Upvotes: 2
Views: 4552
Reputation: 1
The first dimension tracks color channels. The second and third dimensions represent pixels along the height and width of the image, respectively. Since images in the MNIST dataset are grayscale, there's just one channel. Other datasets have images with color, in which case there are three channels: red, green, and blue (RGB).
Upvotes: 0
Reputation: 506
The order is (B, C, W, H) -> (batch, channel, width and height) is which pytorch convolutions operate.
Upvotes: 0
Reputation: 3496
An image seen as a matrix has always 3 dimensions: channels, width and height. 28
and 28
are width and height of course. The 1
in this case is the channel. So what's the channel? Every pixel is represented by three colors: red, blue and green. For each color, you will have one color-channel, so normally 3 (RGB). This makes a pictures dimension (3, W, H). So why do you have a 1 there? Because the MNIST images are black and white and therefore dont need three different color-channel to represent the final color, one channel is enough, therefore for black and white images you dimension is (1, W, H).
Here is a picture below to visualize the dimensions:
source: https://commons.wikimedia.org/wiki/File:RGB_channels_separation.png
So you see, for black and white images you only need one channel. Normally you could ignore the 1 dimension, but pytorch demands the channel dimension.
Upvotes: 6