Reputation: 3275
numpy.histogram(data, bins) is a very fast and efficient way to calculate how many elements of the data array fall in a bin defined by the array bins. Is there an equivalent function to solve the following problem?. I have a matrix with R rows times C columns. I want to bin each row of the matrix using the definition given by bins. The result should be a further matrix with R rows, and with the number of column equal to the number of bins.
I tried to use the function numpy.histogram(data, bins) giving as input a matrix, but I found that the matrix is treated as an array with R*C elements. Then, the result is an array with Nbins elements.
Upvotes: 1
Views: 2202
Reputation: 3275
Thank you everybody for your answers and comments. Finally, I found a way to speed up the binning procedure. Instead of using np.searchsorted(data)
, I am doing np.array(data*nbins, dtype=int)
. Substituting this line in the code posted by Bi Rico, I found that it becomes a factor 3 faster. Here below I post the function by Bi Rico with my modification, so that other user can easily take it.
def hist_per_row(data, bins):
data = np.asarray(data)
assert np.all(bins[:-1] <= bins[1:])
r, c = data.shape
nbins = len(bins)-1
data = data/bins[-1]
idx = array(data*nbins, dtype=int)+1
step = len(bins) + 1
last = step * r
idx += np.arange(0, last, step).reshape((r, 1))
res = np.bincount(idx.ravel(), minlength=last)
res = res.reshape((r, step))
return res[:, 1:-1]
Upvotes: 1
Reputation: 25813
If you're applying this to an array that has many rows this function will give you some speed up at the cost of some temporary memory.
def hist_per_row(data, bins):
data = np.asarray(data)
assert np.all(bins[:-1] <= bins[1:])
r, c = data.shape
idx = bins.searchsorted(data)
step = len(bins) + 1
last = step * r
idx += np.arange(0, last, step).reshape((r, 1))
res = np.bincount(idx.ravel(), minlength=last)
res = res.reshape((r, step))
return res[:, 1:-1]
The res[:, 1:-1]
on the last line is to be consistent with numpy.histogram which returns an array with len len(bins) - 1
, but you could drop it if you want to count values that are less than and greater than bins[0]
and bins[-1]
respectively.
Upvotes: 2
Reputation: 10759
something along these lines?
import numpy as np
data = np.random.rand(10,20)
print np.apply_along_axis(lambda x: np.histogram(x)[0], 1, data)
Upvotes: 0