batuman
batuman

Reputation: 7304

Similar Image shape conversion in Python and OpenCV

I am new to Python and am having difficulty understanding the image shape conversion in Python.

In Python code, image I has I.shape

ipdb> I.shape
(720, 1280, 3)

Running this command in Python converts the I's shape and stored into h5_image

 h5_image = np.transpose(I, (2,0,1)).reshape(data_shape)

Where data_shape is:

 ipdb> p data_shape
 (1, 3, 720, 1280)
  1. What is OpenCV's similar function that does the same output?

  2. In (1, 3, 720, 1280), what does 1 mean?

  3. What is the difference between (3, 720, 1280) and (720, 1280, 3)?

Upvotes: 0

Views: 620

Answers (1)

Elad Joseph
Elad Joseph

Reputation: 3058

You can look on image (I) in python/numpy as a matrix with N dimensions.

  • In the case you have grayscale image, you will have single value for each row and column. This means 2 dimensions and the shape will be: I.shape --> (rows, cols)
  • With RGB image, you have 3 channels, red, green, blue. So you have a total of 3 dimensions: I.shape --> (rows, cols, 3)
  • With RGBA image, you have 4 channels, red, green, blue, alpha. Still 3 dimensions: I.shape --> (rows, cols, 4)

These are the common way to keep image data, but of course you can keep it in any way you like, as long as you know how to read it. For example, you can keep it as one long vector in 1 dimension, and keep also the image width and height, so you know how to read it into 2D format.

For your more specific questions:

  1. I am not sure what is the output you are looking for. You can do transpose() or flip() also in OpenCV.
  2. The (1, 3, 720, 1280) only means you have an additional degenerate dimension. To access each pixel you will have to write I[1,channel,row,col]. The 1 is unnecessary, and it is not a common way to hold an image array. Why do you want to do this? Do you want to save in a specific format? (HDF5?)
  3. The only difference is in the arrangement of your data. For example, in the case of (3, 720, 1280), to get the red channel you need to write: red = I[0,:,:]. While in the case of (720, 1280, 3) you need to write: red = I[:,:,0] (This is more common).

*There are some performance issues which depend on the actual arrangment of the image data in your memory, but I don't think you need to care of this right now.

Upvotes: 3

Related Questions