Reputation: 51
I have two pandas data frames.
df1 :
ACNo Product
1 12340 100% Hot Care
2 23867 Auction5
3 98372 Edition
4 09837 Diet Parameter
5 54332 Load
df2 :
ProdDetail AttrName
1 12345.567 Age Confirmation
2 Model1 Count\100% Hot Care Recipe
3 123445\Handle Improve
4 Diet Edition\Parameter Amount
I want to do a lookup of Product column from df1 on ProdDetail column of df2 and add AttrName column in df1 with the respective values. The string could be at any position in ProdDetails, basically similar to wildcard function in excel. If string appears in ProdDetail of df2, I want to pull the respective AttrName. The result df1 data frame should look like this
ACNo Product AttrName
1 12340 100% Hot Care Recipe
2 23867 Auction5 N/A
3 98372 Edition Amount
4 09837 Diet Parameter N/A
5 54332 Load N/A
Could anyone, please help me out with this in python? I have tried multiple ways but not able to find the solution. I saw a similar post but it was in R, could not find in Python. Below is one of the way I tried
ip=df1['Product']
def lookup_prod(ip):
return df2[(df2['ProdDetail'].str.contains(ip, na=False))]['AttrName']
df1['AttrName'] = data.apply(lambda row: lookup_prod(row['ProdDetails']), axis=1)
df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})
df2 = pd.DataFrame({'ProdDetail': [12345.567, r'Model1 Count\100% Hot Care',
r'123445\Handle', r'Diet Edition\Parameter'],
'AttrName': ['Age Confirmation', 'Recipe' , 'Improve', 'Amount']})
Upvotes: 3
Views: 3243
Reputation: 164623
One way is to use pd.Series.apply
with a custom function and a for
loop:
def lookup_prod(ip):
for row in df2.itertuples():
if ip in row[1]:
return row[2]
else:
return 'N/A'
df1['AttrName'] = df1['Product'].apply(lookup_prod)
print(df1)
# ACNo Product AttrName
# 1 12340 HotCare Recipe
# 2 23867 Auction N/A
# 3 98372 Edition Amount
# 4 9837 DietParameter N/A
# 5 54332 Load N/A
Example #2
This method still works:
import pandas as pd
df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})
df2 = pd.DataFrame({'ProdDetail': [r'Sesonal Items\Limted Number', r'Model1 Count\100% Hot Care',
r'123445\Handle', r'Diet Edition\Parameter'],
'AttrName': ['Age Confirmation', 'Recipe' , 'Improve', 'Amount']})
def lookup_prod(ip):
for row in df2.itertuples():
if ip in str(row.ProdDetail):
return row.AttrName
else:
return 'N/A'
df1['AttrName'] = df1['Product'].apply(lookup_prod)
print(df1)
# ACNo Product AttrName
# 0 12340 100% Hot Care Recipe
# 1 23867 Auction5 N/A
# 2 98372 Edition Amount
# 3 09837 Diet Parameter N/A
# 4 54332 Load N/A
Upvotes: 2
Reputation: 323226
I think str.contains
still work here
df1.Product.apply(lambda x : df2.AttrName[df2.ProdDetail.str.contains(x)].sum(),1)
Out[805]:
1 Recipe
2 False
3 Amount
4 False
5 False
Name: Product, dtype: object
Upvotes: 1