The im2col algorithm for ND input

Question

I am trying to write my own im2col algorithm for input dimensions > 2D. Currently I am looking at the Matlab im2col implementation. However, I cannot find any documentation regarding what is going on for any input of more than 2 dimensions.

I do get an output if I feed in a 3D tensor into the function. However I don't really understand how you get from 2D to ND. The fact that this isn't mentioned in the documentation suggests that its something straightforward, still, I don't get it.

Heck, I dont even understand why the size of the output matrix is the size it is.

Amro · Accepted Answer

Let me just start by saying that im2col is only intended for 2D matrices. The fact that it sometimes worked (and by that I mean returned a result without throwing an error) is just a happy coincidence.

Now I took a peek at edit im2col.m, and without studying the code too much, the first line of each of the distinct and sliding methods should give you an intuition of what's happening:

...
if strcmp(kind, 'distinct')
    [m,n] = size(a);
    ...
elseif strcmp(kind,'sliding')
    [ma,na] = size(a);
    ...
end
...

First recall that [s1,s2] = size(arr) where arr is a 3d array will collapse the size of 2nd and 3rd dimension into one size. Here's the relevant doc size:

[d1,d2,d3,...,dn] = size(X) returns the sizes of the dimensions of the array X, provided the number of output arguments n equals ndims(X). If n < ndims(X), di equals the size of the ith dimension of X for 0, but dn equals the product of the sizes of the remaining dimensions of X, that is, dimensions n through ndims(X).



So basically for an array of size M-by-N-by-P, the function instead thinks it's a matrix of size M-by-(N*P). Now MATLAB has some quirky indexing rules that lets you do things like:

>> x = reshape(1:4*3*2,4,3,2)
x(:,:,1) =
     1     5     9
     2     6    10
     3     7    11
     4     8    12
x(:,:,2) =
    13    17    21
    14    18    22
    15    19    23
    16    20    24
>> x(:,:)
ans =
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24


which is what I think ended up happening here. Here is an example to confirm the behavior of im2col on an RGB image:

% normal case (grayscale image)
>> M = magic(5);
>> B1 = im2col(M, [3 3], 'sliding');

% (RGB image)
>> MM = cat(3, M, M+50, M+100);
>> B2 = im2col(MM, [3 3], 'sliding');
>> B3 = im2col(reshape(MM, [5 5*3]), [3 3], 'sliding');
>> assert(isequal(B2,B3))


Note that B2 and B3 are equal, so basically think of the result of im2col on an array arr = cat(3,R,G,B) to be the same as that of arr = cat(2,R,G,B) (concatenated horizontally).

Interestingly, you won't get so lucky with "distinct" blocks method:

>> B1 = im2col(M, [3 3], 'distinct')    % works
% ..snip..

>> B2 = im2col(MM, [3 3], 'distinct')   % errors
Subscripted assignment dimension mismatch.
Error in im2col (line 59)
    aa(1:m,1:n) = a; 




Now that we understand what was happening, let's think how to do this properly for 3D arrays.

In my opinion to implement im2col for color images, I would just run it on each color channel separately (each being a 2d matrix), and concatenate the result along the third dimension. So something like this wrapper function:

function B = im2col_rgb(img, sz, varargin)
    B = cell(1,size(img,3));
    for i=1:size(img,3)
        B{i} = im2col(img(:,:,i), sz, varargin{:});
    end
    B = cat(3, B{:});
end

The im2col algorithm for ND input

Answers (1)

Related Questions