Gopinath S
Gopinath S

Reputation: 561

Pandas: Extract rows of a DF when a column value matches with a column value of another DF

I have two DF1 and DF2 as mentioned below. The first column 'POS' of both dataframe might have matches but other columns will be different. I want to compare the 'POS' column of both dataframes, if a 'POS' value of DF1 is in DF2 'POS' column the I want to store that row in new DF1 dataframe and do the same for DF2. I could this easily with a dictionary by keeping POS as keys and compare them to get corresponding values. But the dictionary will not accept duplicate 'POS' values so I am wondering if there is a solution in Pandas DF.

df1 = 
   POS   id    freq
0 100 "idex" 3.0
1 102 "ter" 2.0
2 102 "pec" 4.0
3 103 "jek" 4.0
4 104 "jek" 4.0


df2 = 
   POS   id    freq
0 100 "treg" 3.0
1 102 "dfet" 2.2
2 102 "idet" 7.0
3 108 "jeik" 1.0
4 109 "jek" 4.0

Expected:
new_df1 = 
   POS   id    freq
0 100 "idex" 3.0
1 102 "ter" 2.0
2 102 "pec" 4.0


new_df2 = 
   POS   id    freq
0 100 "treg" 3.0
1 102 "dfet" 2.2
2 102 "idet" 7.0

Upvotes: 0

Views: 2695

Answers (2)

Jonathan
Jonathan

Reputation: 846

I believe you are describing a classic join problem. I would recommend the .merge() method:

df = pd.merge(df1, df2, how='left', on='POS')

this will return a new data frame with df1.POS as index. all columns from df2 will be in df1, but only for POS values that match. You can play around with the how= parameter in oder to get what you need. For more information, see types of sql joins

Upvotes: 1

sacuL
sacuL

Reputation: 51335

You can use isin for both dataframes:

new_df1 = df1[df1.POS.isin(df2.POS)]

new_df2 = df2[df2.POS.isin(df1.POS)]

>>> new_df1
   POS    id  freq
0  100  idex   3.0
1  102   ter   2.0
2  102   pec   4.0
>>> new_df2
   POS    id  freq
0  100  treg   3.0
1  102  dfet   2.2
2  102  idet   7.0

Upvotes: 5

Related Questions