Different results using numpy.in1d() with an array and with its single elements

Question

I'm writing a code in Python and I'm having a few problems. I have two arrays, let's say A and B, both of them containing IDs. A has all IDs, and B has IDs belonging to a group. What I'm trying to do is to get the positions of the elements of B in A using the code:

>>> print B
[11600813 11600877 11600941 ..., 13432165 13432229 13434277]
>>> mask=np.nonzero(np.in1d(A, B))
>>> print A[mask]
[12966245 12993389 12665837 ..., 13091877 12965029 13091813]

But this is clearly wrong, since I'm not recovering the values of B. Checking if I was using numpy.in1d() correctly, I tried:

>>> mask=np.nonzero(np.in1d(A, B[0]))
>>> print A[mask]
[11600813]

which is right, so I'm guessing there is a problem with 'B' in numpy.in1d(). I tried using the boolean np.in1d(A, B) directly instead of converting it to indices but it didn't work. I also tried using B = numpy.array(B), B = list(B), and none of them worked.

But if I do B = numpy.array(B)[0], B = list(B)[0] it still works for that element. Unfortunately I can't do a 'for' cycle for each element because len(A) is 16777216 and len(B) is 9166 so it takes a lot of time.

I also made sure that all elements of B are in A:

>>> np.intersect1d(A, B)
[11600813 11600877 11600941 ..., 13432165 13432229 13434277]

HYRY · Accepted Answer

You can use numpy.argsort, numpy.searchsorted to get the positions:

import numpy as np
A = np.unique(np.random.randint(0, 100, 100))
B = np.random.choice(A, 10)

idxA = np.argsort(A)
sortedA = A[idxA]
idxB = np.searchsorted(sortedA, B)
pos = idxA[idxB]
print A[pos]
print B

If you want faster method, consider using pandas.

import pandas as pd
s = pd.Index(A)
pos = s.get_indexer(B)
print A[pos]
print B

Different results using numpy.in1d() with an array and with its single elements

Answers (1)

Related Questions