LostInPy
LostInPy

Reputation: 11

NumPy: indexing array by list of tuples - how to do it correctly?

I am in the following situation - I have the following:

What I want: from a, return an array b with k scalar elements, the ith element in b being the result of indexing a with the ith tuple from t.

Seems trivial enough. The following approach, however, does not work

def get(a, t):
    # wrong result + takes way too long
    return a[t]

I have to resort to doing this iteratively i.e. the following works correctly:

def get(a, t):
    res = []
    for ind in t:
        a_scalar = a
        for i in ind:
            a_scalar = a_scalar[i]

        # a_scalar is now a scalar
        res.append(a_scalar)

    return res

This works, except for the fact that given that each dimension in a has over 30 elements, the procedure does get really slow when n gets to more than 5. I understand that it would be slow regardless, however, I would like to exploit numpy's capabilities as I believe it would speed up this process considerably.

Upvotes: 0

Views: 2077

Answers (2)

Tls Chris
Tls Chris

Reputation: 3824

Explore using np.ravel_multi_index

Create some test data

arr = np.arange(10**4)
arr.shape=10,10,10,10
t = []
for j in range(5):
    t.append( tuple(np.random.randint(10, size = 4)))

print(t)
# [(1, 8, 2, 0),
#  (2, 3, 3, 6),
#  (1, 4, 8, 5),
#  (2, 2, 6, 3),
#  (0, 5, 0, 2),]

ta = np.array(t).T
print(ta)
# array([[1, 2, 1, 2, 0],
#        [8, 3, 4, 2, 5],
#        [2, 3, 8, 6, 0],
#        [0, 6, 5, 3, 2]])

arr.ravel()[np.ravel_multi_index(tuple(ta), (10,10,10,10))]
# array([1820, 2336, 1485, 2263,  502]

np.ravel_multi_index basically calculates, from the tuple of input arrays, the index into a flattened array that starts with shape (in this case) (10, 10, 10, 10).

Does this do what you need? Is it fast enough?

Upvotes: 0

hpaulj
hpaulj

Reputation: 231385

The key to getting this right is to understand the roles of indexing lists and tuples. Often the two are treated the same, but in numpy indexing, tuples, list and arrays convey different information.

In [1]: a = np.arange(12).reshape(3,4)                                          
In [2]: t = np.array([(0,0),(1,1),(2,2)])                                       

In [4]: a                                                                       
Out[4]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [5]: t                                                                       
Out[5]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

You tried:

In [6]: a[t]                                                                    
Out[6]: 
array([[[ 0,  1,  2,  3],
        [ 0,  1,  2,  3]],

       [[ 4,  5,  6,  7],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [ 8,  9, 10, 11]]])

So what's wrong with it? It ran, but selected a (3,2) array of rows of a. That is, it applied t to just the first dimension, effectively a[t, :]. You want to index on all dimensions, some sort of a[t1, t2]. That's the same as a[(t1,t2)] - a tuple of indices.

In [10]: a[tuple(t[0])]                # a[(0,0)]                                         
Out[10]: 0
In [11]: a[tuple(t[1])]                # a[(1,1)]                                         
Out[11]: 5
In [12]: a[tuple(t[2])]                                                         
Out[12]: 10

or doing all at once:

In [13]: a[(t[:,0], t[:,1])]                                                      
Out[13]: array([ 0,  5, 10])

Another way to write it, is n lists (or arrays), one for each dimension:

In [14]: a[[0,1,2],[0,1,2]]                                                     
Out[14]: array([ 0,  5, 10])

In [18]: tuple(t.T)                                                             
Out[18]: (array([0, 1, 2]), array([0, 1, 2]))
In [19]: a[tuple(t.T)]                                                          
Out[19]: array([ 0,  5, 10])

More generally, in a[idx1, idx2] array idx1 is broadcast against idx2 to produce a full selection array. Here the 2 arrays are 1d and match, the selection is your t set of pairs. But the same principle applies to selecting a set of rows and columns, a[ [[0],[2]], [0,2,3] ].

Using the ideas in [10] and following, your get could be sped up with:

In [20]: def get(a, t): 
    ...:     res = [] 
    ...:     for ind in t: 
    ...:         res.append(a[tuple(ind)])  # index all dimensions at once 
    ...:     return res 
    ...:                                                                        
In [21]: get(a,t)                                                               
Out[21]: [0, 5, 10]

If t really was a list of tuples (as opposed to an array built from them), your get could be:

In [23]: tl = [(0,0),(1,1),(2,2)]                                               
In [24]: [a[ind] for ind in tl]                                                 
Out[24]: [0, 5, 10]

Upvotes: 1

Related Questions