Reputation: 3055
Is there any smart way to count the number of occurrences of each value in a very Large PyTorch Tensor? Tensor Size is 11701*300=3510300
or maybe increase or decrease.
TORCH.BINCOUNT, TORCH.UNIQUE and TORCH.UNIQUE_CONSECUTIVE
are not useful so far.
BINCOUNT returns a different number of elements every time. Unique is also not useful as it returns unique values.
This is what I meant when I said it returns different elements every time. If 5 elements will return 8 elements tensor, How I am supposed to know which elements are how many times. this is confusing for me. The official documentation has limited content and there is no other website, explains it.
In the above picture. So, 5 is 2 times. 0 is? what 0 times? How to read this output. it doesn't make any sense to me.
Upvotes: 1
Views: 11329
Reputation: 487
Actually the problem is how you read the output. The output of torch.bincount
is a tensor of size max(input)+1
, that means it covers all bins of size 1 from zero to max(input)
. Therefore, in the output tensor from the first element you see how many 0, 1, 2, ..., max(input)
are there in your non-negative integral array.
For example:
t1 = torch.randint(0,10, (20,))
print(t1)
tensor([2, 5, 7, 3, 1, 2, 7, 8, 8, 0, 5, 6, 4, 4, 4, 6, 3, 0, 6, 6])
in this tensor the max value is 8 (9 did not appear by chance), so it gives:
print(torch.bincount(t1).size())
print(torch.bincount(t1))
torch.Size([9])
tensor([2, 1, 2, 2, 3, 2, 4, 2, 2])
that means, in the tensor t1
there are two 0s, one 1, two 3s, ..., and two 8s.
Upvotes: 4