Reputation: 36545
Is there a way to apply bincount
with "axis = 1"? The desired result would be the same as the list comprehension:
import numpy as np
A = np.array([[1,0],[0,0]])
np.array([np.bincount(r,minlength = np.max(A) + 1) for r in A])
#array([[1,1]
# [2,0]])
Upvotes: 6
Views: 5522
Reputation: 4603
You can use apply_along_axis
, Here is an example
import numpy as np
test_array = np.array([[0, 0, 1], [0, 0, 1]])
print(test_array)
np.apply_along_axis(np.bincount, axis=1, arr= test_array,
minlength = np.max(test_array) +1)
Note the final shape of this array depends on the number of bins, also you can specify other arguments along with apply_along_axis
Upvotes: 5
Reputation: 221574
np.bincount
doesn't work with a 2D array along a certain axis. To get the desired effect with a single vectorized call to np.bincount
, one can create a 1D array of IDs such that different rows would have different IDs even if the elements are the same. This would keep elements from different rows not binning together when using a single call to np.bincount
with those IDs. Thus, such an ID array could be created with an idea of linear indexing
in mind, like so -
N = A.max()+1
id = A + (N*np.arange(A.shape[0]))[:,None]
Then, feed the IDs to np.bincount
and finally reshape back to 2D -
np.bincount(id.ravel(),minlength=N*A.shape[0]).reshape(-1,N)
Upvotes: 8
Reputation: 36545
If the data is too large for this to be efficient, then the issue is more likely to be the memory usage of the dense matrix rather than the numerical operations themself. Here is an example of using a sklearn Hashing Vectorizer on a matrix which is too large to use the bincounts
method (the results are a sparse matrix):
import numpy as np
from sklearn.feature_extraction.text import HashingVectorizer
h = HashingVectorizer()
A = np.random.randint(100,size=(1000,100))*10000
A_str = [" ".join([str(v) for v in i]) for i in A]
%timeit h.fit_transform(A_str)
#10 loops, best of 3: 110 ms per loop
Upvotes: 2