Will
Will

Reputation: 24699

Get NumPy Array Indices in Array B for Unique Values in Array A, for Values Present in Both Arrays, Aligned with Array A

I have two NumPy arrays:

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

I want a function that takes A, B as parameters, and returns the index in B for each value in A, aligned with A, as efficiently as possible.

These are the outputs for the test case above:

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

I've tried exploring in1d(), where(), and nonzero() with no luck. Any help is much appreciated.

Edit: Arrays are strings.

Upvotes: 2

Views: 990

Answers (5)

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

import numpy_indexed as npi
print(npi.indices(B, A))

Upvotes: 1

rtrwalker
rtrwalker

Reputation: 1021

I'm not sure how efficient this is but it works:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

[1 1 0 2 2 2 2 2 3 4 3 3 4]

Upvotes: 0

Daniel
Daniel

Reputation: 19547

You can also do:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

Im not sure what you are doing with this, but a simple and very fast way to do this is:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

Upvotes: 3

Jaime
Jaime

Reputation: 67427

I think you can do it with np.searchsorted:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

Upvotes: 1

ovgolovin
ovgolovin

Reputation: 13410

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

Upvotes: 1

Related Questions