Reputation: 25
I have been reading about the image compression using k x k
boxes which states:
We split the image into
k x k
boxes, and form a new image by taking the average on each block. This way we reduce the file size fromMN
toMN/k^2
pixels.
I am trying to visualize how to write my algorithm. Do I have to loop over k
pixels in x
direction first and then y
direction, picking let's say 3 x 3
in the x
direction and putting an average in the spot?
Does this sound like the solution? I'd be glad if someone could write up a pseudo algorithm.
Upvotes: 2
Views: 323
Reputation: 104464
What gives it away to me is the reduction of the file size from MN
boxes down to MN/k^2
boxes. This means that you are choosing k x k
distinct boxes, finding the average of each block, then setting the centre of each block to be the average.
Let's start with a small numerical example. Let's say I had the following 6 x 6 grayscale image and let's make k = 3
, which means that you want 3 x 3
distinct boxes:
7 5 4 8 7 1
3 10 8 2 9 2
3 7 5 2 8 5
6 5 1 6 7 10
8 4 4 6 8 9
1 3 4 7 8 7
I chose the following dimensions on purpose so that we can compute a clean example. Take note that if you have an image whose rows or columns don't evenly divide into k
, you'll have to consider what happens when you have blocks that don't contain all valid pixels. Some people either fill these in with zeroes, or do some sort of intelligent padding but to make things easy for you, assume that these values are 0.
You have to segment this image into 3 x 3
boxes. You have to loop both vertically and horizontally and collect 3 x 3 boxes. This means that you'll get 4 boxes in total. Scanning horizontally first than vertically after, we get the following boxes.
7 5 4
3 10 8
3 7 5
8 7 1
2 9 2
2 8 5
6 5 1
8 4 4
1 3 4
6 7 10
6 8 9
7 8 7
To compute the output image, find the average of each block, then write the average in a new location in the output image. The 6 x 6
image now gets reduced to 6 / 3 x 6 / 3 = 2 x 2
where each location finds the average of each distinct image block. The centre locations are marked as follows:
7 5 4 8 7 1
3 |10| 8 2 |9| 2
3 7 5 2 8 5
6 5 1 6 7 10
8 |4| 4 6 |8| 9
1 3 4 7 8 7
We now find the average of each block. For the average of the first block, we get:
(7 + 5 + 4 + 3 + 10 + 8 + 3 + 7 + 5) / 9 = 5.7778
If you repeat this for the rest of the blocks, we get the following output image:
5.7778 4.8889
4.0000 7.5556
Now that we've established the fundamentals, there are many ways you can do this. The most canonical way is what you have mentioned. Look over each distinct block, find the average and write this to the write location in the output image. Something like this may be what you're looking for, assuming that your image is stored in A
:
A = imread('...'); %// Read in the image
k = 3; %// Change to whatever suits your needs
rows = size(A,1); cols = size(A,2); %// Get rows and columns of the image
channels = size(A,3); %// Total number of channels
%// Pad the image so that boxes at the end have zeroes
Apad = zeros(ceil(rows/k)*k, ceil(cols/k)*k, channels);
Apad(1:rows, 1:cols, :) = double(A); %// Cast to double for precision
%// Create output image
B = zeros(ceil(rows/k), ceil(cols/k), channels);
%// Find the average of each block
for ii = 1 : size(B,1)
for jj = 1 : size(B,2)
for kk = 1 : size(B,3)
block = Apad((ii-1)*k + 1 : ii*k, (jj-1)*k + 1 : jj*k, kk);
B(ii,jj,kk) = mean(block(:));
end
end
end
%// Convert output image back to original input type
B = cast(B, class(A));
The code is very self explanatory. First, read in the image, select the value of k
, then get the rows, columns and number of channels. We then create a new padded image in case the rows and columns aren't evenly divisible by k
. We then place the original image inside this new padded image and we cast the image to double
to maintain division precision.
We also create an output image that is the right size to accommodate for each output k x k
block. After, we loop and select each distinct block and find the average. Take special case of how we index into the padded image in order to get the right block.
Once you have accomplished this averaging, it's very important that you convert the output image back to the original image type. If you don't do this, then using something like imshow
to display the image will render many pixels to be only black and white because imshow
expects the dynamic range to be between 0 and 1.
There are more efficient ways we can do this however. If you're starting out, definitely keep the code above, but one way I'd tackle this would be to use im2col
. What'll happen here is that the pixel neighbourhoods are constructed in a column-major format, so columns of each pixel neighbourhood are stacked into a single column. You would take all of these stacked columns and place them into a 2D matrix. In our case, the number of rows will be 9 (i.e. 3 x 3
), while we will have as many columns as there are valid image blocks.
How the blocks are obtained are again in column major format. Starting from the top left of the image, 3 x 3
pixel neighbourhoods are gathered progressing down the rows. Once we reach the bottom of the matrix, we then move to the next column, then progress down the rows again. This behaviour of how im2col
works is vital for this averaging is to work.
Once we get this matrix, simply find the average of each column which will produce a single vector, then reshape
this back into the desired output matrix.
Something like this comes to mind. Note that most of the code remains the same as we need this for setup:
A = imread('...'); %// Read in the image
k = 3; %// Change to whatever suits your needs
rows = size(A,1); cols = size(A,2); %// Get rows and columns of the image
channels = size(A,3); %// Total number of channels
%// Pad the image so that boxes at the end have zeroes
Apad = zeros(ceil(rows/k)*k, ceil(cols/k)*k, channels);
Apad(1:rows, 1:cols, :) = double(A); %// Cast to double for precision
%// Create output image
B = zeros(ceil(rows/k), ceil(cols/k), channels);
%// Do the average
for ii = 1 : channels
M = im2col(Apad(:,:,ii), [k k], 'distinct');
B(:,:,ii) = reshape(mean(M,1), [size(B,1), size(B,2)]);
end
%// Convert output image back to original input type
B = cast(B, class(A));
Note that I still had to loop over each channel because im2col
only accepts 2D matrices, so we'd have to access the image on a plane-by-plane basis.
Even shorter, you can do this with blockproc
:
B = blockproc(Apad, [3 3], @(x) mean(mean(x.data,2),1));
All in all, there are many methods you can try. Just experiment!
Upvotes: 2