Reputation: 197
I have a 352x11 matrix, indexed by column 1 with 10 data points. Some of the index values are repeated. I'd like to find the repeated indices and calculate the mean data points for the repeated trials (avoiding loops, if possible).
For example,
x =
26 77.5700 17.9735 32.7200
27 40.5887 16.6100 31.5800
28 60.4734 18.5397 33.6200
28 35.6484 27.2000 54.8000
29 95.3448 19.0000 37.7300
30 82.7273 30.4394 39.1400
to end up with:
ans =
26 77.5700 17.9735 32.7200
27 40.5887 16.6100 31.5800
28 48.0609 22.8699 44.2150
29 95.3448 19.0000 37.7300
30 82.7273 30.4394 39.1400
I was thinking if I used
J = find(diff(x(:,1))==0);
to find the position of the repeated values, I could then apply the function to the corresponding positions of x
, but where do I begin?
Upvotes: 2
Views: 2783
Reputation: 12345
Given you input
x = [ ...
26 77.5700 17.9735 32.7200; ...
27 40.5887 16.6100 31.5800; ...
28 60.4734 18.5397 33.6200; ...
28 35.6484 27.2000 54.8000; ...
29 95.3448 19.0000 37.7300; ...
30 82.7273 30.4394 39.1400];
You can create an array of indexes where duplicated vgalues share the same index, using the third output of unique
.
%Get index of unique values (1 - N)
[~, ~, ix] = unique(x(:,1))
Then you can use this array to rebuild your matrix, combining duplicated values with the function of your choice.
%Use accumarry to rebuild the matrix one column at a time
result = [...
accumarray( ix, x(:,1), [], @max ) ... %Many functions works here, as all inputs are the same. E.G. @mean, @max, @min
accumarray( ix, x(:,2), [], @mean ) ... %Use mean to combine data, per problem statement.
accumarray( ix, x(:,3), [], @mean ) ...
accumarray( ix, x(:,4), [], @mean ) ...
]
Upvotes: 0
Reputation: 45752
You can apply accumarray
to multiple columns as shown here
labels = x(:,1) - min(x(:, 1)) + 1;
labels = [repmat(labels(:),size(x,2),1), kron(1:size(x,2),ones(1,numel(labels))).'];
totals = accumarray(labels,x(:),[], @mean);
This is adapted from Gnovice's code.
To get it to work for your code you then need to delete all the zeros in the front
totals(find(mean((totals == zeros(size(totals)))')), :) = [];
which results in the desired
26.0000 77.5700 17.9735 32.7200
27.0000 40.5887 16.6100 31.5800
28.0000 48.0609 22.8699 44.2100
29.0000 95.3448 19.0000 37.7300
30.0000 82.7273 30.4394 39.1400
Upvotes: 4
Reputation: 32920
A more general approach would employ unique
to find the unique index values:
[U, ix, iu] = unique(x(:, 1));
and then accumarray
:
[c, r] = meshgrid(1:size(x, 2), iu);
y = accumarray([r(:), c(:)], x(:), [], @mean);
The input values to process are actually the second parameter of accumarray
.
The first parameter of accumarray
is a matrix, each row being a set of indices in the (accumulated) output matrix, and it corresponds to a value from the matching row in the vector given as the second parameter.
Think of the output as a cell array. The second parameters are the input values, and each row in the first parameter tells in which cell of the output matrix accumarray
should store the corresponding input value. When output "cell array" is finished, a function (mean
in our case) is applied to each cell.
Here's a short example with a smaller matrix:
x = [27, 10, 8;
28, 20, 10;
28, 30, 50];
We find the unique values by:
[U, ix, iu] = unique(x(:, 1));
Vector U
stores the unique values, and iu
indicates which index of the value associated with each row (note that in this solution we have no use for ix
). In our case we get that:
U =
27
28
iu =
1
2
2
Now we apply accumarray
:
[c, r] = meshgrid(1:size(x, 2), iu);
y = accumarray([r(:), c(:)], x(:), [], @mean);
The fancy trick with meshgrid
and [r(:), c(:)]
produces a set of indices:
[r(:), c(:)] =
1 1
2 1
2 1
1 2
2 2
2 2
1 3
2 3
2 3
and these are the indices for the input values x(:)
, which is a column-vector equivalent of x
:
x(:) =
27
28
28
10
20
30
8
10
50
The process of accumulation:
See what just happened? Both values 28 get accumulated in the same cell (and eventually they will be averaged). The process continues:
and so on...
Once all values are stored in cells, the function mean
is applied on each cell and we get the final output matrix:
y =
27 10 8
28 25 30
Upvotes: 6
Reputation: 114796
You might find accumarray
with @mean
useful:
Assuming first column holds values 1 .. k
for some k <= size(x,1)
, you may compute each column of the output using
col = accumarray( x(:,1), x(:,2), [], @mean ); % second column
Upvotes: 0