DimG
DimG

Reputation: 1781

Vectorized search of element indeces

I have two integer numpy arrays, let's say, arr1 and arr2, that are permutations of range(some_length)

I want to get the third one, where

arr3[idx] = arr1.get_index_of(arr2[idx]) for all idx = 0,1,2,..., some_length-1

here get_index_of method is a pseudo-method of getting index of some element in the collection.

That can be done with naive looping through all the indeces, searching correspondnent element with subsequent assignment of it's index, etc.

But that is slow -- O(n^2). Can it be done faster (At least n*log(n) complexity)? Can it be done via pretty numpy methods? Maybe some sorting with non-trivial key= parameter? Sure there is some elegant solution.

Thank you in advance.

Upvotes: 1

Views: 57

Answers (2)

hpaulj
hpaulj

Reputation: 231550

Lets try a test case

In [166]: arr1=np.random.permutation(10)
In [167]: arr2=np.random.permutation(10)
In [168]: arr1
Out[168]: array([4, 3, 2, 9, 7, 8, 5, 1, 6, 0])
In [169]: arr2
Out[169]: array([9, 2, 6, 4, 0, 3, 1, 7, 8, 5])

np.where(arr1==i) performs your get_index_of method, so your iterative solution is:

In [170]: np.array([np.where(arr1==i)[0] for i in arr2]).flatten()
Out[170]: array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32)

A vectorized approach is to do an 'outter' comparison between the 2 arrays. This produces a (10,10) array, to which we can apply where to get the indices. Still an O(n^2) method, but it is mostly compiled. On this size of a problem it is 5x faster.

In [171]: np.where(arr1==arr2[:,None])
Out[171]: 
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32),
 array([3, 2, 8, 0, 9, 1, 7, 4, 5, 6], dtype=int32))

Upvotes: 1

behzad.nouri
behzad.nouri

Reputation: 77991

say, a is a permutation of 0..9:

>>> a = np.random.permutation(10)
>>> a
array([3, 7, 1, 8, 2, 4, 6, 0, 9, 5])

then, the indexer array is:

>>> i = np.empty(len(a), dtype='i8')
>>> i[a] = np.arange(len(a))
>>> i
array([7, 2, 4, 0, 5, 9, 6, 1, 3, 8])

this means that, index of say 0 in a is i[0] == 7, which is true since a[7] == 0.

So, in your example, say if you have an extra vector b, you can do as in below:

>>> b
array([5, 9, 4, 8, 6, 1, 7, 2, 3, 0])
>>> i[b]
array([9, 8, 5, 3, 6, 2, 1, 4, 0, 7])

which means that, say, b[0] == 5 and index of 5 in a is i[b][0] == 9, which is true, since a[9] = 5 = b[0].

Upvotes: 3

Related Questions