Supratim Haldar
Supratim Haldar

Reputation: 2426

Interpretation of a numpy ndarray

Suppose I want to represent an image of size H*W with 3 color channels (RGB) in a numpy 3-D array, such that the dimension is (H, W, 3). Let's take a simple example of (4,2,3). So we create an array like this - img = np.arange(24).reshape(4,2,3).

In order to fit the analogy of the above image example, the values of the elements should be -

Channel R: [0,1],[2,3],[4,5],[6,7]
Channel G: [8,9],[10,11],[12,13],[14,15]
Channel B: [16,17],[18,19],[20,21],[22,24]

i.e, 3 outer array, and above arrays nested inside.

However, the result of np.arange(24).reshape(4,2,3) is -

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23]]])

If I want the first row of first channel, i.e. img[0,:,0], I would expect [0,1] as result, but I will actually get [0,3] back.

I understand that if I initialize the ndarray with shape (3,4,2), I will get what I am looking for. But I want to work with the conventional shape of (H,W,depth).

Can you please help me understand the gap in my understanding?

Upvotes: 1

Views: 130

Answers (1)

Xenon
Xenon

Reputation: 123

I think your misunderstanding happens because you (wrongly) assume that the transformation from a vector into the array starts filling the first index first. Really, it starts with the last index and moves forward. In your example the order in which the array is filled is

0 -> [0,0,0]

1 -> [0,0,1]

2 -> [0,0,2]

3 -> [0,1,0] etc.

Thus, the first pixel is [0,1,2], the second pixel is [3,4,5] and you get exactly the results you see.

The misunderstanding lies exclusively in your idea how a vector is transformed into such a matrix (and is stored in the background). Once you defined the image everything should be as you expect it.

As an aside: You may indeed encounter images which are saved with size [3,X,Y] instead, as hpaulj commented.

Upvotes: 2

Related Questions