Sudipta Lal Basu
Sudipta Lal Basu

Reputation: 35

Searching in numpy array

I have a 2D numpy array, say A sorted with respect to Column 0. e.g.

Col.0 Col.1 Col.2
10 2.45 3.25
11 2.95 4
12 3.45 4.25
15 3.95 5
18 4.45 5.25
21 4.95 6
23 5.45 6.25
27 5.95 7
29 6.45 7.25
32 6.95 8
35 7.45 8.25

The entries in each row is unique i.e. Col. 0 is the identification number of a co-ordinate in xy plane, Columns 1 and 2 are x and y co-ordinates of these points. I have another array B (rows can contain duplicate data). Column 0 and Column 1 store x and y co-ordinates.

Col.0 Col.1
2.45 3.25
4.45 5.25
6.45 7.25
2.45 3.25

My aim is to find the row index number in array A corresponding to data in array B without using for loop. So, in this case, my output should be [0,4,8,0]. Now, I know that with numpy searchsorted lookup for multiple data can be done in one shot. But, it can be used to compare with a single column of A and not multiple columns. Is there a way to do this?

Upvotes: 2

Views: 541

Answers (2)

Naphat Amundsen
Naphat Amundsen

Reputation: 1623

Pure numpy solution:

My intuition is that I take the difference c between a[:,1:] and b by broadcasting, such that c is of shape (11, 4, 2). The rows that match will be all zeros. Then I do c == False to obtain a mask. I do c.all(2) which results in a boolean array of shape (11, 4), where all True elements represents matches between a and b. Then I simply use np.nonzero to obtain the indices of said elements.

import numpy as np

a = np.array([
    [10, 2.45, 3.25],
    [11, 2.95, 4],
    [12, 3.45, 4.25],
    [15, 3.95, 5],
    [18, 4.45, 5.25],
    [21, 4.95, 6],
    [23, 5.45, 6.25],
    [27, 5.95, 7],
    [29, 6.45, 7.25],
    [32, 6.95, 8],
    [35, 7.45, 8.25],
])

b = np.array([
    [2.45, 3.25],
    [4.45, 5.25],
    [6.45, 7.25],
    [2.45, 3.25],
])

c = (a[:,np.newaxis,1:]-b) == False
rows, cols = c.all(2).nonzero()
print(rows[cols.argsort()])
# [0 4 8 0]

Upvotes: 1

Ehsan
Ehsan

Reputation: 12397

You can use merge in pandas:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index']

output:

0    0
1    4
2    8
3    0
Name: index, dtype: int64

and if you like it as array:

df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index'].to_numpy()
#array([0, 4, 8, 0])

Upvotes: 0

Related Questions