Reputation: 7832
I have a pivot pandas data frame (sales by region) that got created from another pandas data frame (sales by store) using the pivot_table method.
As an example:
df = pd.DataFrame(
{'store':['A','B','C','D','E']*7,
'region':['NW','NW','SW','NE','NE']*7,
'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
})
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]
My goal is to do the following to the sales data frame.
Add a column called week difference that computes the difference between the total sales for this week, and the latest value (by date) for the previous week. Assumption: I always have data for some days a week but it's not fixed days. The week difference column will be different as new data comes in, but for the latest data would look like:
>>> df_sales
sum \
sales
date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17
region
NE 10.0 20.0 30.0 12.0 20.0 30.0
NW 31.0 33.0 31.0 35.0 39.0 49.0
SW 133.0 135.0 140.0 137.0 137.0 145.0
All 174.0 188.0 201.0 184.0 196.0 224.0
date 2017-04-20 WeekDifference
region
NE 50.0 50.0-20.0
NW 38.0 38.0-39.0
SW 141.0 141.0-137.0
All 229.0 229-196.0
Because it's the difference between the latest date and the latest day of the previous week. In this specific example, we are on week 2017-04-20, and the last day of data from previous week is 2017-04-13.
I'd want to do this in a general way as data gets updated.
Upvotes: 1
Views: 338
Reputation: 153460
df = pd.DataFrame(
{'store':['A','B','C','D','E']*7,
'region':['NW','NW','SW','NE','NE']*7,
'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
})
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]
Input:
sum \
sales
date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17
region
NE 10.0 20.0 30.0 12.0 20.0 30.0
NW 31.0 33.0 31.0 35.0 39.0 49.0
SW 133.0 135.0 140.0 137.0 137.0 145.0
All 174.0 188.0 201.0 184.0 196.0 224.0
date 2017-04-20 weekdiffernce
region
NE 50.0 50.0 - 20.0
NW 38.0 38.0 - 39.0
SW 141.0 141.0 - 137.0
All 229.0 229.0 - 196.0
Calculate Last week and one week offset:
last_column = pd.to_datetime(df_sales.iloc[:,-1].name[2])
last_week_column = last_column + pd.DateOffset(weeks=-1)
col_mask = (pd.to_datetime(df_sales.columns.get_level_values(2)).weekofyear == (last_column.weekofyear-1))
df_sales.loc[:,('sum','sales','weekdiffernce')]=df_sales.iloc[:,-1].astype(str) + ' - '+df_sales.loc[:,('sum','sales',last_week_column.strftime('%Y-%m-%d'))].astype(str)
df_sales.loc[:,('sum','sales','weekdiffernce')]=df_sales.iloc[:,-1].astype(str) + ' - '+df_sales.loc[:,('sum','sales',list(col_mask))].iloc[:,-1].astype(str)
print(df_sales)
Output:
sum \
sales
date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17
region
NE 10.0 20.0 30.0 12.0 20.0 30.0
NW 31.0 33.0 31.0 35.0 39.0 49.0
SW 133.0 135.0 140.0 137.0 137.0 145.0
All 174.0 188.0 201.0 184.0 196.0 224.0
date 2017-04-20 weekdiffernce
region
NE 50.0 50.0 - 20.0
NW 38.0 38.0 - 39.0
SW 141.0 141.0 - 137.0
All 229.0 229.0 - 196.0
Upvotes: 1