Reputation: 1470
I have a large DataFrame that looks something like this: df =
UPC Unit_Sales Price Price_Change Date
0 22 15 1.99 NaN 2017-10-10
1 22 7 2.19 True 2017-10-12
2 22 6 2.19 NaN 2017-10-13
3 22 7 1.99 True 2017-10-16
4 22 4 1.99 NaN 2017-10-17
5 35 15 3.99 NaN 2017-10-09
6 35 17 3.99 NaN 2017-10-11
7 35 5 4.29 True 2017-10-13
8 35 8 4.29 NaN 2017-10-15
9 35 2 4.29 NaN 2017-10-15
Basically I am trying to record how the sales of a product(UPC) reacted once the price changed for the following 7 days. I want to create a new column ['Reaction'] which records the sum of the unit sales from the day of price change, and 7 days forward. Keep in mind, sometimes a UPC has more than 2 price changes, so I want a different sum for each price change. So I want to see this:
UPC Unit_Sales Price Price_Change Date Reaction
0 22 15 1.99 NaN 2017-10-10 NaN
1 22 7 2.19 True 2017-10-12 13
2 22 6 2.19 NaN 2017-10-13 NaN
3 22 7 1.99 True 2017-10-16 11
4 22 4 1.99 NaN 2017-10-19 NaN
5 35 15 3.99 NaN 2017-10-09 NaN
6 35 17 3.99 NaN 2017-10-11 NaN
7 35 5 4.29 True 2017-10-13 15
8 35 8 4.29 NaN 2017-10-15 NaN
9 35 2 4.29 NaN 2017-10-18 NaN
What is difficult is how the dates are set up in my data. Sometimes (like for UPC 35) the dates don't range past 7 days. So I would want it to default to the next nearest date, or however many dates there are (if there are less than 7 days).
Here's what I've tried: I set the date to a datetime and I'm thinking of counting days by .days method. This is how I'm thinking of setting a code up (rough draft):
x = df.loc[df['Price_Change'] == 'True']
for x in df:
df['Reaction'] = sum(df.Unit_Sales[1day :8days])
Is there an easier way to do this, maybe without a for loop?
Upvotes: 4
Views: 1540
Reputation: 323226
You just need ffill
with groupby
df.loc[df.Price_Change==True,'Reaction']=df.groupby('UPC').apply(lambda x : (x['Price_Change'].ffill()*x['Unit_Sales']).sum()).values
df
Out[807]:
UPC Unit_Sales Price Price_Change Date Reaction
0 22 15 1.99 NaN 2017-10-10 NaN
1 22 7 2.19 True 2017-10-12 24.0
2 22 6 2.19 NaN 2017-10-13 NaN
3 22 7 2.19 NaN 2017-10-16 NaN
4 22 4 2.19 NaN 2017-10-17 NaN
5 35 15 3.99 NaN 2017-10-09 NaN
6 35 17 3.99 NaN 2017-10-11 NaN
7 35 5 4.29 True 2017-10-13 15.0
8 35 8 4.29 NaN 2017-10-15 NaN
9 35 2 4.29 NaN 2017-10-15 NaN
Update
df['New']=df.groupby('UPC').apply(lambda x : x['Price_Change']==True).cumsum().values
v1=df.groupby(['UPC','New']).apply(lambda x : (x['Price_Change'].ffill()*x['Unit_Sales']).sum())
df=df.merge(v1.reset_index())
df[0]=df[0].mask(df['Price_Change']!=True)
df
Out[927]:
UPC Unit_Sales Price Price_Change Date New 0
0 22 15 1.99 NaN 2017-10-10 0 NaN
1 22 7 2.19 True 2017-10-12 1 13.0
2 22 6 2.19 NaN 2017-10-13 1 NaN
3 22 7 1.99 True 2017-10-16 2 11.0
4 22 4 1.99 NaN 2017-10-17 2 NaN
5 35 15 3.99 NaN 2017-10-09 2 NaN
6 35 17 3.99 NaN 2017-10-11 2 NaN
7 35 5 4.29 True 2017-10-13 3 15.0
8 35 8 4.29 NaN 2017-10-15 3 NaN
9 35 2 4.29 NaN 2017-10-15 3 NaN
Upvotes: 2