Reputation: 255
I have two data frames with four and two columns. For example:
A B C D
0 4 2 320 700
1 5 7 400 800
2 2 6 300 550
3 4 6 100 300
4 5 2 250 360
and
A B
0 2 4
1 5 7
2 2 5
I need to compare the first data frame with the second data frame and if column A and column B in the second data frame was in column A and column B in the first data frame. (order doesn't matter. it means in the first data frame in the first row A is 4, B is 2 and in the second data frame is A is 2 and B is 4 and it's not important but both numbers should be in the columns) keep the whole row in the first data frame; otherwise remove the row. so the output will be :
A B C D
0 4 2 320 700
1 5 7 400 800
2 5 2 250 360
How can I get this output (my actual data frames are so huge and can't iterate through them so need a fast efficient way)?
Upvotes: 4
Views: 398
Reputation: 59304
Using np.equal.outer
arr = np.equal.outer(df, df2)
df.loc[arr.any(1).all(-1).any(-1)]
Outputs
A B C D
0 4 2 320 700
1 5 7 400 800
4 5 2 250 360
Upvotes: 0
Reputation: 323396
What I will do using frozenset
+ isin
yourdf=df[df[['A','B']].apply(frozenset,1).isin(df1.apply(frozenset,1))].copy()
A B C D
0 4 2 320 700
1 5 7 400 800
4 5 2 250 360
Upvotes: 1
Reputation: 403278
I would do this by first sorting, then performing an LEFT OUTER JOIN using merge
with an indicator to determine which rows to keep. Example,
u = df.loc[:, ['A', 'B']]
u.values.sort() # sort columns of `u`
df2.values.sort() # sort columns of `df2`
df[u.merge(df2, how='left', indicator='ind').eval('ind == "both"').values]
A B C D
0 4 2 320 700
1 5 7 400 800
4 5 2 250 360
More info on joins with indicator can be found in my post: Pandas Merging 101
If you don't care about the final result being sorted or not, you can simplify this to an inner join.
df[['A', 'B']] = np.sort(df[['A', 'B']])
df2[:] = np.sort(df2)
df.merge(df2, on=['A', 'B'])
A B C D
0 2 4 320 700
1 5 7 400 800
2 2 5 250 360
Upvotes: 2