Reputation: 3739
I have two Dataframes that store data about products procured in a store. df1
stores data about the store name, product id, product name and the date when the purchase was made. df2
stores data about product id, product name and the type. I am trying to update df2
with the Date recevied value in df1
but only for products of type P
.
Given below are views of the dataframe and what I tried doing.
df1
:
StoreName,ProdId,ProdName,DateReceived
Store A,P1,Prod1,2018-05-01
Store A,P2,Prod2,2018-05-02
Store B,P1,Prod1,2018-05-04
df2
:
DateRecived,ProdId,ProdName,Type
,P1,Prod1,P
,P2,Prod2,P
,P3,Prod3,S
Script:
df2['DateRecived'] = df2['ProdId'].map(df1.set_index('ProdId')['StoreName']).df2['Type'] == 'P'
Running this throws the below error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Could anyone help me modify the script so that I am able to filter out value by Store Name
and Prod Name
and get df2
populated with the DateReceived
value. Thanks.
Upvotes: 9
Views: 25179
Reputation: 863531
Problem is duplicates - P1
product is twice:
s = df1.set_index('ProdId')['StoreName']
print (s)
ProdId
P1 Store A
P2 Store A
P1 Store B
Name: StoreName, dtype: object
So need unique values, drop_duplicates
keep only first value:
s = df1.drop_duplicates('ProdId').set_index('ProdId')['StoreName']
print (s)
ProdId
P1 Store A
P2 Store A
Name: StoreName, dtype: object
And then is possible repalce by boolean mask:
mask = df2['Type'] == 'P'
df2['DateRecived'] = df2['DateRecived'].mask(mask, df2['ProdId'].map(s))
print (df2)
DateRecived ProdId ProdName Type
0 Store A P1 Prod1 P
1 Store A P2 Prod2 P
2 NaN P3 Prod3 S
df2.loc[mask, 'DateRecived'] = df2.loc[mask, 'ProdId'].map(s)
print (df2)
DateRecived ProdId ProdName Type
0 Store A P1 Prod1 P
1 Store A P2 Prod2 P
2 NaN P3 Prod3 S
Upvotes: 11