Reputation: 15708
I have a table which has groups of rows, given by unique combinations of a few columns. I wish to compute the number of rows in each group. What is the most efficient way to do this? I know I can use grpstats
but it seems to be very inefficient when there are a large number of groups.
For example
rng(0,'twister');
N = 30; % control number of groups
c1 = randi([1 N],1000000,1);
c2 = randi([1 N],1000000,1);
c3 = randi([1 N],1000000,1);
T = array2table([c1 c2 c3]);
tic; gT = grpstats(T, {'Var1' 'Var2' 'Var3'}, 'numel'); toc;
Using grpstats
seems to really blow out the time quadratically or worse. When N = 3
, it takes 0.73 seconds on my machine. When N = 10
, it takes 2.6 seconds. When N = 30
, it takes 72 seconds.
Upvotes: 2
Views: 118
Reputation: 112659
This seems to be about 80 times faster for your example (table with 1000000 rows and 3 columns):
[gT, ~, v] = unique(T, 'rows');
gT.GroupCount = accumarray(v, 1);
Upvotes: 3