Garrett Berg
Garrett Berg

Reputation: 2635

Numpy: Find the values of rows given an array of indexes

I have a 2D array of values and a 1D array of indexes. I want to pull the values from the index of each row using an array of indexes. The following code would do this successfully:

from pprint import pprint
import numpy as np
_2Darray = np.arange(100, dtype = np.float16)
_2Darray = _2Darray.reshape((10, 10))
array_indexes = [5,5,5,4,4,4,6,6,6,8]
index_values = []
for row, index in enumerate(array_indexes):
    index_values.append(_2Darray[row, index])
pprint(_2Darray)
print index_values

Returns

array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]], dtype=float16)
[5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

But I want to do it using only numpy functions. I have tried a whole bunch of numpy functions, but none of them seem to do this fairly simply task.

Thanks in advance!


Edit I managed to figure out what my implementation would be: V_high = np.fromiter((

index_values = _2Darray[ind[0], ind[1]] for ind in
                    enumerate(array_indexes)),
                    dtype = _2Darray.dtype,
                    count = len(_2Darray))

Thanks to root I've got both implementations worked out. Now for some profiling: My implementation run through cProfiler

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    2    0.274    0.137    0.622    0.311 {numpy.core.multiarray.fromiter}
20274    0.259    0.000    0.259    0.000 lazer_np.py:86(<genexpr>)

And root's:

    4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
    1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}

I can't believe it, but the cProfiler is not detecting root's method to take any time at all. I think this must be some kind of bug, but it is definitely noticeably faster. On an earlier test I got root's to be about 3 times faster

Note: these tests were done on a shape = (20273, 200) array of np.float16 values. Additionally, each indexing had to be run twice for each test.

Upvotes: 0

Views: 868

Answers (3)

root
root

Reputation: 80426

In [15]: _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
Out[15]: array([  5.,  15.,  25.,  34.,  44.,  54.,  66.,  76.,  86.,  98.],
         dtype=float16)

BUT, I think something based on you solution may actually be the faster on smaller arrays. If the arrays are bigger than 100*100 use numpy indexing.

In [22]: def f(array, indices):
    ...:     return [array[row, index] for row, index in enumerate(indices)]

In [23]: f(_2Darray, [5,5,5,4,4,4,6,6,6,8])
Out[23]: [5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

In [27]: %timeit f(_2Darray,[5,5,5,4,4,4,6,6,6,8])
100000 loops, best of 3: 7.48 us per loop

In [28]: %timeit _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
10000 loops, best of 3: 24.2 us per loop

Upvotes: 3

Bi Rico
Bi Rico

Reputation: 25833

This should do it:

row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]

Numpy allows you to index 2d arrays (or nd arrays really) with two arrays such that:

for i in range(len(row)):
    result1[i] = array[row[i], col[i]]

result2 = array[row, col]
numpy.all(result1 == result2)

Upvotes: 5

chiffa
chiffa

Reputation: 2088

You have to pay attention to use specifically numpy functions designed for arrays, not for matrixes. The two are easy to confuse and do not raise error when methods of one are called on the other, yet the output is pretty much unpredicatable.

Upvotes: 0

Related Questions