Reputation: 25
How would I approach dividing a bitmap into segments and use for parallel processing? I already have the Height and Width of the Bitmap but where from here. I've read to use MPI_Cart_shift()
and MPI_Sendrecv()
. However, I'm unsure how to approach using them.
width = BMP_GetWidth (bmp);
height = BMP_GetHeight (bmp);
new_bmp = BMP_Create(width, height, 24); // BMP_Create(UINT width, UINT height, USHORT depth)
Upvotes: 0
Views: 337
Reputation: 37222
How I'd approach dividing a bitmap into segments for use with parallel processing depends on what type of processing is being done.
Your tags (but not your question) mention Gaussian blur, so that's probably a good place to start.
For Gaussian blur, each output pixel depends on lots of input pixels and nothing else. If each processor has a (read only) copy of all input pixels then you can split the work however you like, but "banding" would work best. Specifically, if there are N processors the first processor would find the first group of "total_pixels/N" output pixels (which is likely to be a band of pixels at the top of the image), the second processor would do the second group of "total_pixels/N" output pixels (likely to be a band of pixels just below the first band), etc. Once all the processors are done you'd just append the output pixels from each processor in the right order to get the whole output bitmap.
Note that (due to rounding) some processors may need to do a different number of pixels - e.g. if the bitmap has 10000 pixels and you have 64 processors, then "10000/64 = 156.25" but a processor can't do a quarter of a pixel, so you end up with 48 processors doing 156 pixels while 16 processors do 157 pixels ("48*156 + 16*157 = 10000").
Also, if processors may be different speeds and/or different latencies, you might want to split the work into more pieces (e.g. if there are 64 processors split the work into 128 pieces, where slower processors might only do 1 piece while faster processors might do 4 pieces).
If the processors don't already have a copy of all input pixels (and if there's no shared memory) then you can send each processor a fraction of all pixels. For example, if you have a Gaussian matrix that's 7 rows high (3 rows above the output position, one row at the output position and 3 rows below the output position), and if each processor outputs a band of 100 rows of pixels, then you'd send each processor a "3+100+3 = 106" band of input pixels to work on (except for the processors that do the first band and last band, which would only get "3+100" or "100+3" rows of input pixels).
For something like (e.g.) Floyd–Steinberg dithering things get a lot more complicated because one output pixel depends on all previous output pixels (in addition to the input pixels). In this case you can split the "3 colour" bitmap into three separate monochrome bitmaps (one for each processor, for up to 3 processors) and each processor can dither its monochrome bitmap, and then you can merge the three resulting monochrome bitmaps back together to get a single "3 colour" output bitmap; but it's virtually impossible to use more than 3 processors (without changing to a different dithering algorithm that's more suited to parallelisation).
For drawing one circle or one ellipse, you might get each processor to draw an arc and combine the arcs; for drawing 1234 shapes you might split the image into a grid and get each processor to do a tile within the grid.
Upvotes: 1