Reputation: 543
I have following two dataframes:
df1:
name
abc
lmn
pqr
df2:
m_name n_name loc
abc tyu IND
bcd abc RSA
efg poi SL
lmn ert AUS
nne bnm ENG
pqr lmn NZ
xyz asd BAN
I want to generate a new dataframe on following condition:
if df2.m_name==df1.name or df2.n_name==df1.name
eliminate duplicate rows
Following is desired output:
m_name n_name loc
abc tyu IND
bcd abc RSA
lmn ert AUS
pqr lmn NZ
Can I get any suggestions on how to achieve this??
Upvotes: 2
Views: 67
Reputation: 863771
Use:
print (df2)
m_name n_name loc
0 abc tyu IND
1 abc tyu IND
2 bcd abc RSA
3 efg poi SL
4 lmn ert AUS
5 nne bnm ENG
6 pqr lmn NZ
7 xyz asd BAN
df3 = df2.filter(like='name')
#another solution is filter columns by columns names in list
#df3 = df2[['m_name','n_name']]
df = df2[df3.isin(df1['name'].tolist()).any(axis=1)]
df = df.drop_duplicates(df3.columns)
print (df)
m_name n_name loc
0 abc tyu IND
2 bcd abc RSA
4 lmn ert AUS
6 pqr lmn NZ
Details:
Seelct all columns with name
by filter
:
print (df2.filter(like='name'))
m_name n_name
0 abc tyu
1 abc tyu
2 bcd abc
3 efg poi
4 lmn ert
5 nne bnm
6 pqr lmn
7 xyz asd
Compare by DataFrame.isin
:
print (df2.filter(like='name').isin(df1['name'].tolist()))
m_name n_name
0 True False
1 True False
2 False True
3 False False
4 True False
5 False False
6 True True
7 False False
Get at least one True
per row by any
:
print (df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1))
0 True
1 True
2 True
3 False
4 True
5 False
6 True
7 False
dtype: bool
Filter by boolean indexing
:
df = df2[df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1)]
print (df)
m_name n_name loc
0 abc tyu IND
1 abc tyu IND
2 bcd abc RSA
4 lmn ert AUS
6 pqr lmn NZ
And last remove duplicates drop_duplicates
(If need remove dupes by all name
columns add subset
parameter)
df = df.drop_duplicates(subset=df3.columns)
print (df)
m_name n_name loc
0 abc tyu IND
2 bcd abc RSA
4 lmn ert AUS
6 pqr lmn NZ
Upvotes: 2
Reputation: 77027
Use
In [56]: df2[df2.m_name.isin(df1.name) | df2.n_name.isin(df1.name)]
Out[56]:
m_name n_name loc
0 abc tyu IND
1 bcd abc RSA
3 lmn ert AUS
5 pqr lmn NZ
Or using query
In [58]: df2.query('m_name in @df1.name or n_name in @df1.name')
Out[58]:
m_name n_name loc
0 abc tyu IND
1 bcd abc RSA
3 lmn ert AUS
5 pqr lmn NZ
Upvotes: 2