amark
amark

Reputation: 51

Match data between 2 pandas data frame and extract the matching value of another column in Python

I have two pandas data frames.

df1 :
     ACNo       Product
1   12340       100% Hot Care
2   23867       Auction5
3   98372       Edition
4   09837       Diet Parameter
5   54332       Load

df2 :
    ProdDetail                          AttrName
1   12345.567                           Age Confirmation
2   Model1 Count\100% Hot Care          Recipe
3   123445\Handle                       Improve
4   Diet Edition\Parameter              Amount

I want to do a lookup of Product column from df1 on ProdDetail column of df2 and add AttrName column in df1 with the respective values. The string could be at any position in ProdDetails, basically similar to wildcard function in excel. If string appears in ProdDetail of df2, I want to pull the respective AttrName. The result df1 data frame should look like this

        ACNo        Product             AttrName
1       12340       100% Hot Care       Recipe  
2       23867       Auction5            N/A
3       98372       Edition             Amount
4       09837       Diet Parameter      N/A
5       54332       Load                N/A

Could anyone, please help me out with this in python? I have tried multiple ways but not able to find the solution. I saw a similar post but it was in R, could not find in Python. Below is one of the way I tried

ip=df1['Product']
def lookup_prod(ip):
      return df2[(df2['ProdDetail'].str.contains(ip, na=False))]['AttrName']
df1['AttrName'] = data.apply(lambda row: lookup_prod(row['ProdDetails']), axis=1)

df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
                    'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})

df2 = pd.DataFrame({'ProdDetail': [12345.567, r'Model1 Count\100% Hot Care',
                                   r'123445\Handle',  r'Diet Edition\Parameter'],
                    'AttrName': ['Age Confirmation', 'Recipe' , 'Improve',  'Amount']})

Upvotes: 3

Views: 3243

Answers (2)

jpp
jpp

Reputation: 164623

One way is to use pd.Series.apply with a custom function and a for loop:

def lookup_prod(ip):
    for row in df2.itertuples():
        if ip in row[1]:
            return row[2]
    else:
        return 'N/A'

df1['AttrName'] = df1['Product'].apply(lookup_prod)

print(df1)

#     ACNo        Product AttrName
# 1  12340        HotCare   Recipe
# 2  23867        Auction      N/A
# 3  98372        Edition   Amount
# 4   9837  DietParameter      N/A
# 5  54332           Load      N/A

Example #2

This method still works:

import pandas as pd

df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
                    'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})

df2 = pd.DataFrame({'ProdDetail': [r'Sesonal Items\Limted  Number', r'Model1 Count\100% Hot Care',
                                   r'123445\Handle',  r'Diet Edition\Parameter'],
                    'AttrName': ['Age Confirmation', 'Recipe' , 'Improve',  'Amount']})

def lookup_prod(ip):
    for row in df2.itertuples():
        if ip in str(row.ProdDetail):
            return row.AttrName
    else:
        return 'N/A'

df1['AttrName'] = df1['Product'].apply(lookup_prod)

print(df1)

#     ACNo         Product AttrName
# 0  12340   100% Hot Care   Recipe
# 1  23867        Auction5      N/A
# 2  98372         Edition   Amount
# 3  09837  Diet Parameter      N/A
# 4  54332            Load      N/A

Upvotes: 2

BENY
BENY

Reputation: 323226

I think str.contains still work here

df1.Product.apply(lambda x : df2.AttrName[df2.ProdDetail.str.contains(x)].sum(),1)
Out[805]: 
1    Recipe
2     False
3    Amount
4     False
5     False
Name: Product, dtype: object

Upvotes: 1

Related Questions