Reputation: 277
Given a 2 x d dimensional numpy array M, I want to count the number of occurences of each column of M. That is, I'm looking for a general version of bincount
.
What I tried so far: (1) Converted columns to tuples (2) Hashed tuples (via hash
) to natural numbers (3) used numpy.bincount
.
This seems rather clumsy. Is anybody aware of a more elegant and efficient way?
Upvotes: 6
Views: 4106
Reputation: 5324
Given:
a = np.array([[ 0, 1, 2, 4, 5, 1, 2, 3],
[ 4, 5, 6, 8, 9, 5, 6, 7],
[ 8, 9, 10, 12, 13, 9, 10, 11]])
b = np.transpose(a)
A more efficient solution than hashing (still requires manipulation):
I create a view of the array with the flexible data type np.void
(see here) such that each row becomes a single element. Converting to this shape will allow np.unique
to operate on it.
%%timeit
c = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize*b.shape[1])))
_, index, counts = np.unique(c, return_index = True, return_counts = True)
#counts are in the last column, remember original array is transposed
>>>np.concatenate((b[idx], cnt[:, None]), axis = 1)
array([[ 0, 4, 8, 1],
[ 1, 5, 9, 2],
[ 2, 6, 10, 2],
[ 3, 7, 11, 1],
[ 4, 8, 12, 1],
[ 5, 9, 13, 1]])
10000 loops, best of 3: 65.4 µs per loop
The counts appended to the unique columns of a
.
Your hashing solution.
%%timeit
array_hash = [hash(tuple(row)) for row in b]
uniq, index, counts = np.unique(array_hash, return_index= True, return_counts = True)
np.concatenate((b[idx], cnt[:, None]), axis = 1)
10000 loops, best of 3: 89.5 µs per loop
Update: Eph's solution is the most efficient and elegant.
%%timeit
Counter(map(tuple, a.T))
10000 loops, best of 3: 38.3 µs per loop
Upvotes: 2
Reputation: 2028
You can use collections.Counter
:
>>> import numpy as np
>>> a = np.array([[ 0, 1, 2, 4, 5, 1, 2, 3],
... [ 4, 5, 6, 8, 9, 5, 6, 7],
... [ 8, 9, 10, 12, 13, 9, 10, 11]])
>>> from collections import Counter
>>> Counter(map(tuple, a.T))
Counter({(2, 6, 10): 2, (1, 5, 9): 2, (4, 8, 12): 1, (5, 9, 13): 1, (3, 7, 11):
1, (0, 4, 8): 1})
Upvotes: 5