johni07
johni07

Reputation: 771

Load png image for embedded system

I am working on a embedded deep learning inference C++ project using tensorRT. For my model it is necessary to subtract the mean image.

The api that I'm using allows me to define a mean image with the following data structure for rgb images:

uint8_t *data[DW_MAX_IMAGE_PLANES];       // raw image data 
size_t pitch;                             // pitch of the image in bytes
uint32_t height;                          // height of the image in px
uint32_t width;                           // image width in px
uint32_t planeCount;                      // plane count of the image

So far I found the lib LodePNG, which is quite usefull for this task I think. It can load pngs with just a few lines:

// Load file and decode image.
std::vector<unsigned char> image;
unsigned width, height;
unsigned error = lodepng::decode(image, width, height, filename);

The question now is how to convert std::vector<unsigned char> to uint8_t *[DW_MAX_IMAGE_PLANES] and calculate the pitch and planeCount values?

As I'm using rgb images DW_MAX_IMAGE_PLANES equals 3.

Upvotes: 0

Views: 859

Answers (1)

D Krueger
D Krueger

Reputation: 2476

The values for pitch and planeCount are simple. Since LodePNG's decode defaults to bitdepth = 8, the value of pitch, in bytes, is 1. And because the image is RGB, the value of planeCount is 3--one plane for each color.

Since you are not using the alpha channel, you should probably have LodePNG simply decode into RGB format directly:

unsigned error = lodepng::decode(image, width, height, filename, LCT_RGB);

But once the image is decoded into the std::vector<unsigned char>, you will not be able to use it directly. The decoded data from LodePNG is in the following format:

image -> R0, G0, B0, R1, G1, B1, R2, G2, B2, ...

But you need it in the following format:

data[0] -> R0, R1, R2, ...
data[1] -> G0, G1, G2, ...
data[2] -> B0, B1, B2, ...

If you are memory constrained, you'll have to rearrange the values in the image vector (R0, R1, ... Rn, G0, G1, ... Gn, B0, B1, ... Bn) and calculate the appropriate pointers to initialize the data array.

If you have available memory, you can create separate vectors for each of the three color channels. Then copy the data from the decoded image and initialize the data array with pointers to the first element of the vectors.

Upvotes: 1

Related Questions