maxymoo
maxymoo

Reputation: 36545

Apply bincount to each row of a 2D numpy array

Is there a way to apply bincount with "axis = 1"? The desired result would be the same as the list comprehension:

import numpy as np
A = np.array([[1,0],[0,0]])
np.array([np.bincount(r,minlength = np.max(A) + 1) for r in A])

#array([[1,1]
#       [2,0]])

Upvotes: 6

Views: 5522

Answers (3)

sushmit
sushmit

Reputation: 4603

You can use apply_along_axis, Here is an example

import numpy as np
test_array = np.array([[0, 0, 1], [0, 0, 1]])
print(test_array)
np.apply_along_axis(np.bincount, axis=1, arr= test_array,
                                          minlength = np.max(test_array) +1)

Note the final shape of this array depends on the number of bins, also you can specify other arguments along with apply_along_axis

Upvotes: 5

Divakar
Divakar

Reputation: 221574

np.bincount doesn't work with a 2D array along a certain axis. To get the desired effect with a single vectorized call to np.bincount, one can create a 1D array of IDs such that different rows would have different IDs even if the elements are the same. This would keep elements from different rows not binning together when using a single call to np.bincount with those IDs. Thus, such an ID array could be created with an idea of linear indexing in mind, like so -

N = A.max()+1
id = A + (N*np.arange(A.shape[0]))[:,None]

Then, feed the IDs to np.bincount and finally reshape back to 2D -

np.bincount(id.ravel(),minlength=N*A.shape[0]).reshape(-1,N)

Upvotes: 8

maxymoo
maxymoo

Reputation: 36545

If the data is too large for this to be efficient, then the issue is more likely to be the memory usage of the dense matrix rather than the numerical operations themself. Here is an example of using a sklearn Hashing Vectorizer on a matrix which is too large to use the bincounts method (the results are a sparse matrix):

import numpy as np
from sklearn.feature_extraction.text import HashingVectorizer
h = HashingVectorizer()
A = np.random.randint(100,size=(1000,100))*10000
A_str = [" ".join([str(v) for v in i]) for i in A]

%timeit h.fit_transform(A_str)
#10 loops, best of 3: 110 ms per loop

Upvotes: 2

Related Questions