Mohammad Saifullah
Mohammad Saifullah

Reputation: 1143

Comparing 2 columns of two Python Pandas dataframes and getting the common rows

I have 2 Dataframe as follows:

DF1=
    A    B   C    D
0   AA   BA  KK   0
1   AD   BD  LL   0
2   AF   BF  MM   0

DF2=
    K    L
0   AA   BA
1   AD   BF
2   AF   BF

At the end what I want to get is:

DF1=
    A    B   C    D
0   AA   BA  KK   1
1   AD   BD  LL   0
2   AF   BF  MM   1

So, I want to compare two dataframe, I want to see which rows of first data frame (for column A and B) are in common of of second dataframe(Column K and L) and assign 1 on the coulmn D of first dataframe.

I can use for loop, but It will be very slow for large number of entries.

Any clue or suggestion will be appreciated.

Upvotes: 9

Views: 46602

Answers (4)

PiotrKu
PiotrKu

Reputation: 21

DF1.merge(right=DF2, left_on=[DF1.A, DF1.B], right_on=[DF2.K, DF2.L], indicator=True, how='left')

gives:

A   B   C  D    K    L     _merge
0  AA  BA  KK  0   AA   BA       both
1  AD  BD  LL  0  NaN  NaN  left_only
2  AF  BF  MM  0   AF   BF       both

So, as above, indicator does the job.

Upvotes: 2

Vipul Saxena
Vipul Saxena

Reputation: 71

df1['ColumnName'].isin(df2['ColumnName']).value_counts()

Upvotes: 5

Mohammad Saifullah
Mohammad Saifullah

Reputation: 1143

This is how I solved it:

df1 = pd.DataFrame({"A":['AA','AD','AD'], "B":['BA','BD','BF']})
df2 = pd.DataFrame({"A":['AA','AD'], 'B':['BA','BF']})
df1['compressed']=df1.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df2['compressed']=df2.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df1['Success'] = df1['compressed'].isin(df2['compressed']).astype(int)
print df1

    A   B     compressed   Success
0  AA  BA      AABA          1
1  AD  BD      ADBD          0
2  AD  BF      ADBF          1

Upvotes: 2

EdChum
EdChum

Reputation: 394459

This would be easier if you renamed the columns of df2 and then you can compare row-wise:

In [35]:

df2.columns = ['A', 'B']
df2
Out[35]:
    A   B
0  AA  BA
1  AD  BF
2  AF  BF
In [38]:

df1['D'] = (df1[['A', 'B']] == df2).all(axis=1).astype(int)
df1
Out[38]:
    A   B   C  D
0  AA  BA  KK  1
1  AD  BD  LL  0
2  AF  BF  MM  1

Upvotes: 17

Related Questions