James Lane
James Lane

Reputation: 45

Pandas: Search and match based on two conditions

I am using the code below to make a search on a .csv file and match a column in both files and grab a different column I want and add it as a new column. However, I am trying to make the match based on two columns instead of one. Is there a way to do this?

import pandas as pd
df1 = pd.read_csv("matchone.csv")
df2 = pd.read_csv("comingfrom.csv")

def lookup_prod(ip):
    for row in df2.itertuples():
        if ip in row[1]:
            return row[3]
    else:
        return '0'

df1['want'] = df1['name'].apply(lookup_prod)

df1[df1.want != '0']
print(df1)
#df1.to_csv('file_name.csv')

The code above makes a search from the column name 'samename' in both files and gets the column I request ([3]) from the df2. I want to make the code make a match for both column 'name' and another column 'price' and only if both columns in both df1 and df2 match then the code take the value on ([3]).

df 1 :

name price value
a     10    35
b     10    21
c     10    33
d     10    20
e     10    88

df 2 :
name price want
a     10   123
b     5    222
c     10   944
d     10   104
e     5    213

When the code is run (asking for the want column from d2, based on both if df1 name = df2 name) the produced result is :

name price value want
a     10    35   123
b     10    21   222
c     10    33   944
d     10    20   104
e     10    88   213

However, what I want is if both df1 name = df2 name and df1 price = df2 price, then take the column df2 want, so the desired result is:

name price value want
a     10    35   123
b     10    21    0
c     10    33   944
d     10    20   104
e     10    88    0

Upvotes: 2

Views: 1066

Answers (4)

Peeyush Jain
Peeyush Jain

Reputation: 33

#Try this code it will give you expected results

import pandas as pd

df1 = pd.DataFrame({'name' :['a','b','c','d','e'] ,
                    'price' :[10,10,10,10,10],
                    'value' : [35,21,33,20,88]})

df2 = pd.DataFrame({'name' :['a','b','c','d','e'] ,
                    'price' :[10,5,10,10,5],
                    'want' : [123,222,944,104 ,213]})



new = pd.merge(df1,df2, how='left', left_on=['name','price'], right_on=['name','price'])

print(new.fillna(0))

Upvotes: 0

miriess
miriess

Reputation: 164

Expanding on https://stackoverflow.com/a/73830294/20110802:

You can add the validate option to the merge in order to avoid duplication on one side (or both):

pd.merge(df1, df2, on=['name','price'], how='left', validate='1:1').fillna(0)

Also, if the float conversion is a problem for you, one option is to do an inner join first and then pd.concat the result with the "leftover" df1 where you already added a constant valued column. Would look something like:

df_inner = pd.merge(df1, df2, on=['name', 'price'], how='inner', validate='1:1')
merged_pairs = set(zip(df_inner.name, df_inner.price))
df_anti = df1.loc[~pd.Series(zip(df1.name, df1.price)).isin(merged_pairs)]
df_anti['want'] = 0
df_result = pd.concat([df_inner, df_anti]) # perhaps ignore_index=True ?

Looks complicated, but should be quite performant because it filters by set. I think there might be a possibility to set name and price as index, merge on index and then filter by index to not having to do the zip-set-shenanigans, bit I'm no expert on multiindex-handling.

Upvotes: 0

user16836078
user16836078

Reputation:

If you are matching the two dataframes based on the name and the price, you can use df.where and df.isin

df1['want'] = df2['want'].where(df1[['name','price']].isin(df2).all(axis=1)).fillna('0')

df1
  name  price  value   want
0    a     10     35  123.0
1    b     10     21      0
2    c     10     33  944.0
3    d     10     20  104.0
4    e     10     88      0

Upvotes: 0

Boris Silantev
Boris Silantev

Reputation: 763

You need to use pandas.DataFrame.merge() method with multiple keys:

df1.merge(df2, on=['name','price'], how='left').fillna(0)

Method represents missing values as NaNs, so that the column's dtype changes to float64 but you can change it back after filling the missed values with 0.

Also please be aware that duplicated combinations of name and price in df2 will appear several times in the result.

Upvotes: 8

Related Questions