Alex
Alex

Reputation: 15708

Most efficient way of counting number of rows in each group of a table

I have a table which has groups of rows, given by unique combinations of a few columns. I wish to compute the number of rows in each group. What is the most efficient way to do this? I know I can use grpstats but it seems to be very inefficient when there are a large number of groups.

For example

rng(0,'twister');
N = 30; % control number of groups

c1 = randi([1 N],1000000,1);
c2 = randi([1 N],1000000,1);
c3 = randi([1 N],1000000,1);

T = array2table([c1 c2 c3]);

tic; gT = grpstats(T, {'Var1' 'Var2' 'Var3'}, 'numel'); toc;

Using grpstats seems to really blow out the time quadratically or worse. When N = 3, it takes 0.73 seconds on my machine. When N = 10, it takes 2.6 seconds. When N = 30, it takes 72 seconds.

Upvotes: 2

Views: 118

Answers (1)

Luis Mendo
Luis Mendo

Reputation: 112659

This seems to be about 80 times faster for your example (table with 1000000 rows and 3 columns):

[gT, ~, v] = unique(T, 'rows');
gT.GroupCount = accumarray(v, 1);

Upvotes: 3

Related Questions