8eastFromThe3ast
8eastFromThe3ast

Reputation: 197

MATLAB find mean of column in matrix using two different indices

I have a 22007x3 matrix with data in column 3 and two separate indices in columns 1 and 2.

eg.

x = 

    1   3   4
    1   3   5
    1   3   5
    1   16  4
    1   16  3
    1   16  4
    2   4   1
    2   4   3
    2   11  2
    2   11  3
    2   11  2

I need to find the mean of the values in column 3 when the values in column 1 are the same AND the values in column 2 are the same, to end up with something like:

ans = 

    1   3   4.6667
    1   16  3.6667
    2   4   2
    2   11  2.3333

Please bear in mind that in my data, the number of times the values in column 1 and 2 occur can be different.

Two options I've tried already are the meshgrid/accumarray option, using two distinct unique functions and a 3D array:

[U, ix, iu] = unique(x(:, 1));
[U2,ix2,iu2] = unique(x(:,2));
[c, r, j] = meshgrid((1:size(x(:, 1), 2)), iu, iu2);
totals = accumarray([r(:), c(:), j(:)], x(:), [], @nanmean);

which gives me this:

??? Maximum variable size allowed by the program is exceeded.

Error in ==> meshgrid at 60
    xx = xx(ones(ny,1),:,ones(nz,1));

and the loop option,

for i=1:size(x,1)
    if x(i,2)== x(i+1,2);
        totals(i,:)=accumarray(x(:,1),x(:,3),[],@nanmean);
    end
end

which is obviously so very, very wrong, not least because of the x(i+1,2) bit.

I'm also considering creating separate matrices depending on how many times a value in column 1 occurs, but that would be long and inefficient, so I'm loathe to go down that road.

Upvotes: 1

Views: 1624

Answers (2)

Floris
Floris

Reputation: 46435

This is an ideal opportunity to use sparse matrix math.

x = [ 1 2 5;
      1 2 7;
      2 4 6;
      3 4 6;
      1 4 8;
      2 4 8;
      1 1 10]; % for example

SM = sparse(x(:,1),x(:,2), x(:,3); 
disp(SM)

Result:

(1,1)   10
(1,2)   12
(1,4)    8
(2,4)   14
(3,6)    7

As you can see, we did the "accumulate same indices into same container" in one fell swoop. Now you need to know how many elements you have:

NE = sparse(x(:,1), x(:,2), ones(size(x(:,1))));
disp(NE);

Result:

(1,1)   1
(1,2)   2
(1,4)   1
(2,4)   2
(3,6)   1

Finally, you divide one by the other to get the mean (only use elements that have a value):

matrixMean = SM;
nz = find(NE>0);
matrixMean(nz) = SM(nz) ./ NE(nz);

If you then disp(matrixMean), you get

(1,1)    10
(1,2)     6
(1,4)     8
(2,4)     7
(3,6)     7

If you want to access the individual elements differently, then after you have computed SM and NE you can do

[i j n] = find(NE);
matrixMean = SM(i,j)./NE(i,j);
disp([i(:) j(:) nonzeros(matrixMean)]);

Upvotes: 1

Oleg
Oleg

Reputation: 10676

Group on the first two columns with a unique(...,'rows'), then accumulate only the third column (always the best approach to accumulate only where accumulation really happens, thus avoiding indices, i.e. the first two columns, which you can reattach with unX):

[unX,~,subs] = unique(x(:,1:2),'rows');
out          = [unX accumarray(subs,x(:,3),[],@nanmean)];

out =
            1            3       4.6667
            1           16       3.6667
            2            4            2
            2           11       2.33

Upvotes: 5

Related Questions