Reputation: 858
I have the following data:
A = [1 2 ; 3 2; 4 7; 10 2; 6 7; 10 9]
B = [1 2 3; 4 4 9; 1 8 0; 3 7 9; 3 6 8]
C = [4; 10; 6; 3; 1]
A =
1 2
3 2
4 7
10 2
6 7
10 9
B =
1 2 3
4 4 9
1 8 0
3 7 9
3 6 8
C.' =
4 10 6 3 1
For each unique value in A(:,2)
I need to take the corresponding values in A(:,1)
,
look for their value in C
, then take the relevant rows in B
and compute their mean.
The result should be length(unique(A(:,2))
x size(B,2)
;
The expected result for this example:
B
Explanation: Indices 1, 3 and 10 that correspond to value "2" in A
are
at indices 2, 4, 5 in C
.Correspondingly:
B
.B
.I compute it now by applying unique
on A
and iterating each value, searching the right indices. My data set is quite large, so it takes quite a time. How can I avoid the loops?
Upvotes: 1
Views: 1026
Reputation: 9864
Here is another solution without arrayfun
and accumarray
using good old-fashion matrix multiplication:
r = bsxfun(@eq, A(:,1), C')*(1:numel(C))';
[~,m,n] = unique(A(:,2));
f=histc(n, 1:numel(m));
result = diag(1./f)*bsxfun(@eq, 1:numel(m), n).'*B(r,:);
I ran a benchmark against other two solutions and it appears to be faster than both. For 1000 repetitions:
Here is the benchmark code:
N = 1e3;
tic
for k=1:N,
r = bsxfun(@eq, A(:,1), C')*(1:numel(C))'; % faster than [~,r] = ismember(A(:,1), C)
[~,m,n] = unique(A(:,2));
f=histc(n, 1:numel(m));
result2 = diag(1./f)*bsxfun(@eq, 1:numel(m), n).'*B(r,:);
end
toc
tic
for k=1:N,
[U, ia, iu] = unique(A(:, 2));
[tf, loc] = ismember(A(:, 1), C);
[X, Y] = meshgrid(1:size(B, 2), iu);
result1 = accumarray([Y(:), X(:)], reshape(B(loc, :), [], 1), [], @mean);
end
toc
tic
for k=1:N,
D = [arrayfun(@(x) find(C == x,1,'first'), A(:,1) ), A(:,2)];
data = [B(D(:,1),:), D(:,2)];
st = grpstats(data(:,1:3),data(:,4:4),{'mean'});
end
toc
Upvotes: 3
Reputation: 32920
Let's do what you say in the question step by step:
For each unique value in A(:, 2)
:
[U, ia, iu] = unique(A(:, 2));
Take the corresponding values in A(:, 1)
and look for their value in C
:
[tf, loc] = ismember(A(:, 1), C);
It's also recommended to make sure, just in case, that all values are actually found in C
:
assert(all(tf))
Then take the relevant rows in B
and compute their mean:
[X, Y] = meshgrid(1:size(B, 2), iu);
result = accumarray([Y(:), X(:)], reshape(B(loc, :), 1, []), [], @mean);
Hope this helps! :)
%// Sample input
A = [1 2 ; 3 2; 4 7; 10 2; 6 7; 10 9];
B = [1 2 3; 4 4 9; 1 8 0; 3 7 9; 3 6 8];
C = [4; 10; 6; 3; 1];
%// Compute means
[U, ia, iu] = unique(A(:, 2));
[tf, loc] = ismember(A(:, 1), C);
[X, Y] = meshgrid(1:size(B, 2), iu);
result = accumarray([Y(:), X(:)], reshape(B(loc, :), [], 1), [], @mean);
The result is:
result =
3.3333 5.6667 8.6667
1.0000 5.0000 1.5000
4.0000 4.0000 9.0000
Upvotes: 6
Reputation: 858
Thanks, I also thought of:
D = [arrayfun(@(x) find(C == x,1,'first'), A(:,1) ), A(:,2)];
data = [B(D(:,1),:), D(:,2)];
st = grpstats(data(:,1:3),data(:,4:4),{'mean'});
Upvotes: 1