Dnaiel
Dnaiel

Reputation: 7832

Week difference from current week to last day previous week

I have a pivot pandas data frame (sales by region) that got created from another pandas data frame (sales by store) using the pivot_table method.

As an example:

df = pd.DataFrame(
    {'store':['A','B','C','D','E']*7, 
     'region':['NW','NW','SW','NE','NE']*7, 
     'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
     'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
     })
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]

My goal is to do the following to the sales data frame.

Add a column called week difference that computes the difference between the total sales for this week, and the latest value (by date) for the previous week. Assumption: I always have data for some days a week but it's not fixed days. The week difference column will be different as new data comes in, but for the latest data would look like:

>>> df_sales
              sum                                                         \
            sales                                                          
date   2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17   
region                                                                     
NE           10.0       20.0       30.0       12.0       20.0       30.0   
NW           31.0       33.0       31.0       35.0       39.0       49.0   
SW          133.0      135.0      140.0      137.0      137.0      145.0   
All         174.0      188.0      201.0      184.0      196.0      224.0   



date   2017-04-20 WeekDifference 
region             
NE           50.0    50.0-20.0
NW           38.0    38.0-39.0
SW          141.0    141.0-137.0
All         229.0    229-196.0

Because it's the difference between the latest date and the latest day of the previous week. In this specific example, we are on week 2017-04-20, and the last day of data from previous week is 2017-04-13.

I'd want to do this in a general way as data gets updated.

Upvotes: 1

Views: 338

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

df = pd.DataFrame(
    {'store':['A','B','C','D','E']*7, 
     'region':['NW','NW','SW','NE','NE']*7, 
     'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
     'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
     })
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]

Input:

              sum                                                         \
            sales                                                          
date   2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17   
region                                                                     
NE           10.0       20.0       30.0       12.0       20.0       30.0   
NW           31.0       33.0       31.0       35.0       39.0       49.0   
SW          133.0      135.0      140.0      137.0      137.0      145.0   
All         174.0      188.0      201.0      184.0      196.0      224.0   



date   2017-04-20  weekdiffernce  
region                            
NE           50.0    50.0 - 20.0  
NW           38.0    38.0 - 39.0  
SW          141.0  141.0 - 137.0  
All         229.0  229.0 - 196.0  

Calculate Last week and one week offset:

last_column = pd.to_datetime(df_sales.iloc[:,-1].name[2])

last_week_column = last_column + pd.DateOffset(weeks=-1)

col_mask = (pd.to_datetime(df_sales.columns.get_level_values(2)).weekofyear == (last_column.weekofyear-1))    

df_sales.loc[:,('sum','sales','weekdiffernce')]=df_sales.iloc[:,-1].astype(str) + ' - '+df_sales.loc[:,('sum','sales',last_week_column.strftime('%Y-%m-%d'))].astype(str)

df_sales.loc[:,('sum','sales','weekdiffernce')]=df_sales.iloc[:,-1].astype(str) + ' - '+df_sales.loc[:,('sum','sales',list(col_mask))].iloc[:,-1].astype(str)

print(df_sales)

Output:

              sum                                                         \
            sales                                                          
date   2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17   
region                                                                     
NE           10.0       20.0       30.0       12.0       20.0       30.0   
NW           31.0       33.0       31.0       35.0       39.0       49.0   
SW          133.0      135.0      140.0      137.0      137.0      145.0   
All         174.0      188.0      201.0      184.0      196.0      224.0   



date   2017-04-20  weekdiffernce  
region                            
NE           50.0    50.0 - 20.0  
NW           38.0    38.0 - 39.0  
SW          141.0  141.0 - 137.0  
All         229.0  229.0 - 196.0

Upvotes: 1

Related Questions