Reputation: 143
I have a vector of about 15 million integers (int32) containing about 1 million unique values. I would like to find an efficient method of counting the repeated integers and relating them back to the original vector so that I have a new vector containing counts for each element.
That's probably clear as mud, so here's a small example of what I'm looking for:
A = [1 1 1 3 2 5 2 1 2 6];
...
result = [4 4 4 1 3 1 3 4 3 1];
My (impractically slow ) implementation is as follows (Note that in my version of matlab, hist does not work with integers):
A = randi(1e6,[15e6,1],'int32');
result = zeros(size(A),'int32');
[uniqueA,~,iuA] = unique(A);
counts = accumarray(iuA,1);
So far, so good: uniqueA contains a list of the unique elements of A, and counts contains a list of the corresponding number of each. This is pretty quick.
Next comes the slow part. I have tried the following to retrieve the indices of each element:
cellIndex = arrayfun(@(x) A == x, uniqueA,'UniformOutput',false);
but this runs out of memory (with 16 GB ram) and grinds to a halt when it starts swapping. To avoid this I tried looping over the unique elements (1 million), which is also slow:
for n = 1:length(uA)
result(A == uA(n)) = counts(n);
end
I don't know how long this takes, because I've been waiting half an hour and it still isn't finished.
Any ideas on how I can accomplish my task efficiently?
Upvotes: 3
Views: 89
Reputation: 221624
One approach -
[unq,~,idx] = unique(A);
out = changem(idx,histc(A,unq),1:max(idx))
Sample run -
>> A
A =
1 1 1 3 2 5 2 1 2 6
>> [unq,~,idx] = unique(A);
>> changem(idx,histc(A,unq),1:max(idx))
ans =
4 4 4 1 3 1 3 4 3 1
Here's a simpler version -
[unq,~,idx] = unique(A);
counts = histc(A,unq);
out = counts(idx)
Sample run -
>> A
A =
1 1 1 3 2 5 2 1 2 6
>> [unq,~,idx] = unique(A);
>> counts = histc(A,unq);
>> counts(idx)
ans =
4 4 4 1 3 1 3 4 3 1
Upvotes: 3