Reputation: 79
I have got two NumPy
arrays. In my case Y
contains an output and P
the probability that this output is correct. Rows and columns are of the form (outputs, noOfAnswers) or (probability, noOfAnswers). So in general output is much bigger than noOfAnswers.
I am selecting the two most significant results concerning P
by:
chooseThem = np.argpartition(P,-2,axis=1)[:,-2:]
Now I wish to create a new Array YP
of the size (outputs, 2) with just the values specified by chooseThem
. With a for
loop this is straightforward but the performance is not OK.
Here is an example for the "bad approach" with some artificial arrays:
import numpy as np
Y = 4*(np.random.rand(1000,6)-0.5)
P = np.random.rand(1000,6)
biggest2 = np.argpartition(P,-2,axis=1)[:,-2:]
YNew = np.zeros((1000,2))
for j in range(2):
for i in range(1000):
YNew[i,j] = Y[i,biggest2[i,j]]
Does anyone have a suggestion for a fast way to create this new array?
Upvotes: 1
Views: 253
Reputation: 9264
This works for slicing the array
dex = np.array([np.arange(1000),np.arange(1000)]).T
YNew = Y[dex,biggest2]
with some testing (old = loop method new = index method)
1000 rows
%timeit new(Y,P,1000,biggest2)
The slowest run took 4.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.1 µs per loop
%timeit old(Y,P,1000,biggest2)
1000 loops, best of 3: 853 µs per loop
100000 rows
%timeit new(Y,P,100000,biggest2)
100 loops, best of 3: 4.49 ms per loop
%timeit old(Y,P,100000,biggest2)
10 loops, best of 3: 89.4 ms per loop
Upvotes: 1