Reputation: 35
I have a 2D numpy array, say A sorted with respect to Column 0. e.g.
Col.0 | Col.1 | Col.2 |
---|---|---|
10 | 2.45 | 3.25 |
11 | 2.95 | 4 |
12 | 3.45 | 4.25 |
15 | 3.95 | 5 |
18 | 4.45 | 5.25 |
21 | 4.95 | 6 |
23 | 5.45 | 6.25 |
27 | 5.95 | 7 |
29 | 6.45 | 7.25 |
32 | 6.95 | 8 |
35 | 7.45 | 8.25 |
The entries in each row is unique i.e. Col. 0 is the identification number of a co-ordinate in xy plane, Columns 1 and 2 are x and y co-ordinates of these points. I have another array B (rows can contain duplicate data). Column 0 and Column 1 store x and y co-ordinates.
Col.0 | Col.1 |
---|---|
2.45 | 3.25 |
4.45 | 5.25 |
6.45 | 7.25 |
2.45 | 3.25 |
My aim is to find the row index number in array A corresponding to data in array B without using for loop. So, in this case, my output should be [0,4,8,0]
.
Now, I know that with numpy searchsorted lookup for multiple data can be done in one shot. But, it can be used to compare with a single column of A and not multiple columns. Is there a way to do this?
Upvotes: 2
Views: 541
Reputation: 1623
My intuition is that I take the difference c
between a[:,1:]
and b
by broadcasting, such that c
is of shape (11, 4, 2)
. The rows that match will be all zeros. Then I do c == False
to obtain a mask. I do c.all(2)
which results in a boolean array of shape (11, 4)
, where all True
elements represents matches between a
and b
. Then I simply use np.nonzero
to obtain the indices of said elements.
import numpy as np
a = np.array([
[10, 2.45, 3.25],
[11, 2.95, 4],
[12, 3.45, 4.25],
[15, 3.95, 5],
[18, 4.45, 5.25],
[21, 4.95, 6],
[23, 5.45, 6.25],
[27, 5.95, 7],
[29, 6.45, 7.25],
[32, 6.95, 8],
[35, 7.45, 8.25],
])
b = np.array([
[2.45, 3.25],
[4.45, 5.25],
[6.45, 7.25],
[2.45, 3.25],
])
c = (a[:,np.newaxis,1:]-b) == False
rows, cols = c.all(2).nonzero()
print(rows[cols.argsort()])
# [0 4 8 0]
Upvotes: 1
Reputation: 12397
You can use merge in pandas:
df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index']
output:
0 0
1 4
2 8
3 0
Name: index, dtype: int64
and if you like it as array:
df2.merge(df1.reset_index(),how='left',left_on=['Col.0','Col.1'],right_on=['Col.1','Col.2'])['index'].to_numpy()
#array([0, 4, 8, 0])
Upvotes: 0