Selecting dataframe rows based on values in other dataframe

Question

I have following two dataframes:

df1:

name
abc
lmn
pqr

df2:

m_name  n_name  loc
abc     tyu     IND
bcd     abc     RSA
efg     poi     SL      
lmn     ert     AUS
nne     bnm     ENG
pqr     lmn     NZ
xyz     asd     BAN

I want to generate a new dataframe on following condition:

if df2.m_name==df1.name or df2.n_name==df1.name
eliminate duplicate rows

Following is desired output:

m_name  n_name  loc
abc     tyu     IND
bcd     abc     RSA
lmn     ert     AUS
pqr     lmn     NZ

Can I get any suggestions on how to achieve this??

jezrael · Accepted Answer

Use:

print (df2)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
3    efg    poi   SL
4    lmn    ert  AUS
5    nne    bnm  ENG
6    pqr    lmn   NZ
7    xyz    asd  BAN

df3 = df2.filter(like='name')
#another solution is filter columns by columns names in list
#df3 = df2[['m_name','n_name']]
df = df2[df3.isin(df1['name'].tolist()).any(axis=1)]
df = df.drop_duplicates(df3.columns)
print (df)
  m_name n_name  loc
0    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

Details:

Seelct all columns with name by filter:

print (df2.filter(like='name'))
  m_name n_name
0    abc    tyu
1    abc    tyu
2    bcd    abc
3    efg    poi
4    lmn    ert
5    nne    bnm
6    pqr    lmn
7    xyz    asd

Compare by DataFrame.isin:

print (df2.filter(like='name').isin(df1['name'].tolist()))
   m_name  n_name
0    True   False
1    True   False
2   False    True
3   False   False
4    True   False
5   False   False
6    True    True
7   False   False

Get at least one True per row by any:

print (df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1))
0     True
1     True
2     True
3    False
4     True
5    False
6     True
7    False
dtype: bool

Filter by boolean indexing:

df = df2[df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1)]
print (df)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

And last remove duplicates drop_duplicates (If need remove dupes by all name columns add subset parameter)

df = df.drop_duplicates(subset=df3.columns)
print (df)
  m_name n_name  loc
0    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

Selecting dataframe rows based on values in other dataframe

Answers (2)

Related Questions