Ali Ghadimi
Ali Ghadimi

Reputation: 23

Flattening image matrixes for Deep Learning

I have a question about flattening image matrixes in this case (64 x 64 pix x 3) to a vector (12288 x 1).

I understand that each image pixel is in a (64 X 64) matrix, and if I'm right, each element of this matrix is a vector of length 3, holding R,G,B data for that single pixel. So the first following row is R, G, B values for the top-left pixel:

train_set[0]
>> array([[[17, 31, 56],
        [22, 33, 59],
        [25, 35, 62],

My question starts from here:

When we flatten the first image data (in dataset of 100s of samples), using the following code:

train_set_flatten = train_set.reshape(train_set.shape[0], -1).T

the first 3 elements of train_set_flatten are R,G,B data for the first pixel:

train_set_flatten[:,0][0:10]
array([17, 31, 56, 22, 33, 59, 25, 35, 62, 25], dtype=uint8)

But in some textbooks, we suppose to first list all the elements of the "R matrix", then "G" and then "B", but what I have now is not in this order, is my vector correct or I need to find another way to flatten the matrix?

please see the instruction from Neural Networks and Deep Learning by DeepLearning.AI coursera.org

Upvotes: 0

Views: 1038

Answers (2)

Ali Ghadimi
Ali Ghadimi

Reputation: 23

I found the answer, quoting from https://community.deeplearning.ai/, by Paul Mielke

in the following line of code:

train_set.reshape(train_set.shape[0], -1).T 

we can add: order='F' or order='C'

train_set.reshape(train_set.shape[0], -1, order='F' / order='C' ).T 

The way to think about the difference between “C” and “F” order for an image is to remember that the highest dimension is the RGB color dimension. So what that means is that with “C” order you get all three colors for each pixel together. With “F” order, what you get is all the Red pixel values in order across and down the image, followed by all the Green pixels, followed by all the Blue pixels. So it’s like three separate monochrome images back to back. It’s worth trying the experiment of using “F” order on all your reshapes and then running the training and confirming that you get the same accuracy results. In other words (as I said in my previous post), the algorithm can learn the patterns either way. It just matters that you are consistent in how you do the unrolling. (Paul Mielke)

I trained a model with both order='F' / order='C' and the result was the same.

Upvotes: 2

cnp
cnp

Reputation: 339

I think that it depends on your model design. If you design your model inputs with three arrays for three channels (R, G, B), you can try my way below. We need to separate it first and reshape it later.

import numpy as np
a = np.array([[17, 31, 56],
        [22, 33, 59],
        [25, 35, 62]])

R = a[:,0]
G = a[:,1]
B = a[:,2]
R = R.reshape(R.shape[0], -1).T
G = G.reshape(G.shape[0], -1).T
B = B.reshape(B.shape[0], -1).T

print(R)
print(G)
print(B) 

Upvotes: 1

Related Questions