Reputation: 421
I have two dataframes with two of its columns that really matter. One of the column consists of float64 values and the other is string. the dataframes are of different sizes.
I would like to match both the Number
column and the Item
column at the same time and then get only the ones that match.
df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})
df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})
df1
Number Item
0 1.0 Phone
1 3.0 Watch
2 4.0 Pen
3 5.0 Pencil
4 8.0 Pencil
5 12.0 toolkit
6 32.0 box
7 58.0 fork
df2
Number Item
0 3.0 Watch
1 4.0 Pen
2 8.0 Pencil
3 12.0 Eraser
4 15.0 bottle
5 32.0 box
6 54.0 toolkit
7 58.0 fork
8 72.0 Phone
I am trying to use forloop and the loop goes very long. It seems to be very bad method of achieving this. I am trying to use mask operation but unsure how to achieve this. Appreciate help in performing this in the shortest way possible.
The desired result should look like:
Item Matching Number
0 Phone No Match 1.0
1 Watch Matched 3.0
2 Pen Matched 4.0
3 Pencil No Match 5.0
4 Pencil Matched 8.0
5 toolkit No Match 12.0
6 box Matched 32.0
7 fork Matched 58.0
Upvotes: 2
Views: 837
Reputation: 11907
You can come to the dataframe you need by some simple loc
and isin
as shown below
df = df1.copy()
df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)
Number Item Matching
1.0 Phone No Match
3.0 Watch Matched
4.0 Pen Matched
5.0 Pencil No Match
8.0 Pencil Matched
12.0 toolkit Matched
32.0 box Matched
58.0 fork Matched
Upvotes: 1
Reputation: 164623
You are looking for a left merge with indicator=True
:
res = pd.merge(df1, df2, how='left', indicator=True)
print(res)
Item Number _merge
0 Phone 1.0 left_only
1 Watch 3.0 both
2 Pen 4.0 both
3 Pencil 5.0 left_only
4 Pencil 8.0 both
5 toolkit 12.0 left_only
6 box 32.0 both
7 fork 58.0 both
In general, avoid explicit for
loops when purpose-built methods are available, as these are generally optimized for performance. You can, if you wish, then replace strings via a dictionary mapping:
d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)
Upvotes: 2
Reputation: 862511
If problem merge float values, is possible multiple by 1000
and cast to integers and then merge
with left join, because there should be problem with matching, baciuse float precision should be different in both columns:
df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)
df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})
df = df.drop(['Number1','_merge'], axis=1)
print (df)
Number Item Matching
0 1.0 Phone No Match
1 3.0 Watch Match
2 4.0 Pen Match
3 5.0 Pencil No Match
4 8.0 Pencil Match
5 12.0 toolkit No Match
6 32.0 box Match
7 58.0 fork Match
Upvotes: 3