Reputation: 187
If I have two numpy arrays of values; how can I quickly make a third array that gives me the number of times I have the same two values in the first two arrays?
For example:
x = np.round(np.random.random(2500),2)
xIndex = np.linspace(0, 1, 100)
y = np.round(np.random.random(2500)*10,2)
yIndex = np.linspace(0, 10, 1000)
z = np.zeros((100,1000))
Right now, I'm doing the following loop (which is prohibitively slow):
for m in x:
for n in y:
q = np.where(xIndex == m)[0][0]
l = np.where(yIndex == n)[0][0]
z[q][l] += 1
Then I can do a contour plot (or heat map, or whatever) of xIndex, yIndex, and z. But I know I'm not doing a "Pythonic" way of solving this, and there's just no way for me to run over the hundreds of millions of data points I have for this in anything approaching a reasonable timeframe.
How do I do this the right way? Thanks for reading!
Upvotes: 2
Views: 169
Reputation: 114330
You can truncate the code dramatically.
First, since you have a linear scale at which you're binning, you can eliminate the explicit arrays xIndex
and yIndex
entirely. You can express the exact indices into z
as
xi = np.round(np.random.random(2500) * 100).astype(int)
yi = np.round(np.random.random(2500) * 1000).astype(int)
Second, you don't need the loop. The issue with the normal +
operator (a.k.a. np.add
) is that it's buffered. A consequence of that is that you won't get the right count for multiple occurrencs of the same index. Fortunately, ufuncs have an at
method to handle that, and add
is a ufunc.
Third, and finally, broadcasting allows you to specify how to mesh the arrays for a fancy index:
np.add.at(z, (xi[:, None], yi), 1)
If you're building a 2D histogram, you don't need to round the raw data. You can round just the indices instead:
x = np.random.random(2500)
y = np.random.random(2500) * 10
z = np.zeros((100,1000))
np.add.at(z, (np.round(100 * x).astype(int), np.round(100 * y).astype(int)), 1)
Upvotes: 4