Reputation: 2635
I have a 2D array of values and a 1D array of indexes. I want to pull the values from the index of each row using an array of indexes. The following code would do this successfully:
from pprint import pprint
import numpy as np
_2Darray = np.arange(100, dtype = np.float16)
_2Darray = _2Darray.reshape((10, 10))
array_indexes = [5,5,5,4,4,4,6,6,6,8]
index_values = []
for row, index in enumerate(array_indexes):
index_values.append(_2Darray[row, index])
pprint(_2Darray)
print index_values
Returns
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24., 25., 26., 27., 28., 29.],
[ 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.],
[ 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.],
[ 50., 51., 52., 53., 54., 55., 56., 57., 58., 59.],
[ 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.],
[ 70., 71., 72., 73., 74., 75., 76., 77., 78., 79.],
[ 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.],
[ 90., 91., 92., 93., 94., 95., 96., 97., 98., 99.]], dtype=float16)
[5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]
But I want to do it using only numpy functions. I have tried a whole bunch of numpy functions, but none of them seem to do this fairly simply task.
Thanks in advance!
Edit I managed to figure out what my implementation would be: V_high = np.fromiter((
index_values = _2Darray[ind[0], ind[1]] for ind in
enumerate(array_indexes)),
dtype = _2Darray.dtype,
count = len(_2Darray))
Thanks to root I've got both implementations worked out. Now for some profiling: My implementation run through cProfiler
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.274 0.137 0.622 0.311 {numpy.core.multiarray.fromiter}
20274 0.259 0.000 0.259 0.000 lazer_np.py:86(<genexpr>)
And root's:
4 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.arange}
I can't believe it, but the cProfiler is not detecting root's method to take any time at all. I think this must be some kind of bug, but it is definitely noticeably faster. On an earlier test I got root's to be about 3 times faster
Note: these tests were done on a shape = (20273, 200) array of np.float16 values. Additionally, each indexing had to be run twice for each test.
Upvotes: 0
Views: 868
Reputation: 80426
In [15]: _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
Out[15]: array([ 5., 15., 25., 34., 44., 54., 66., 76., 86., 98.],
dtype=float16)
BUT, I think something based on you solution may actually be the faster on smaller arrays. If the arrays are bigger than 100*100
use numpy
indexing.
In [22]: def f(array, indices):
...: return [array[row, index] for row, index in enumerate(indices)]
In [23]: f(_2Darray, [5,5,5,4,4,4,6,6,6,8])
Out[23]: [5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]
In [27]: %timeit f(_2Darray,[5,5,5,4,4,4,6,6,6,8])
100000 loops, best of 3: 7.48 us per loop
In [28]: %timeit _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
10000 loops, best of 3: 24.2 us per loop
Upvotes: 3
Reputation: 25833
This should do it:
row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]
Numpy allows you to index 2d arrays (or nd arrays really) with two arrays such that:
for i in range(len(row)):
result1[i] = array[row[i], col[i]]
result2 = array[row, col]
numpy.all(result1 == result2)
Upvotes: 5
Reputation: 2088
You have to pay attention to use specifically numpy functions designed for arrays, not for matrixes. The two are easy to confuse and do not raise error when methods of one are called on the other, yet the output is pretty much unpredicatable.
Upvotes: 0