chitown88
chitown88

Reputation: 28565

calculate week over week changes in Pandas (with a groupby)?

I've been able to successfully calculate the changes over week to week with my data quite fine. However, my data includes thousands of groups that I need to have sorted by. So I am looking for a faster/more efficient way to calculate these week by week changes than how I am currently implementing it.

The way it currently runs, is I have a for loop that does the week to week changes for each subset/store_ID. The calculation works great, but with over 10,000 different items to do this for, takes a rather long time to do. Is there a way to do this by grouping my 'store_ID' column? I've been playing with the .groupby...but not quite sure how to work with it since it's a groupby object.

Here's my code and how it is working:

I have a dataframe called df with all my infomation. It was already cleanded and sorted so each store_ID is in ascending order by week. And to keep the idea simple, let's just say I only have these columns:

df[['store_ID', 'Week', 'Sales']]

so....

# Create list of each store
list_of_stores = list(df['store_ID'].unique())

# Create dataframe to dump the results into
results_df = pd.DataFrame()

# Iterate store-by-store to calculate the week to week values
for store in list_of_stores:

    # Create a temporary dataframe to do the calculation for the store_ID
    temp_df = pd.DataFrame()
    temp_df = df[df['store_ID'] == store]
    index_list = list(temp_df.index)
    temp_df.index = temp_df['Week']
    temp_df['Sales_change_1_week']= temp_df['Sales'] - 
    temp_df['Sales'].shift(1, freq=Week())
    temp_df.index = index_list

    # Dump the temporary dataframe into a results dataframe
    results_df = results_df.append(temp_df)

So at the end, I have the completed results for all the store_IDs, for each week. I do have to note, there are some missing weeks, so in that case, I do have nulls for weeks that could not calculate the change from previous week, and I'm fine with that.

So I take each store_ID:

  1. create a temp dataframe that has it sorted by 'Week'.
  2. I store the original index
  3. then re-index by the week (so that it can do the shift by week).
  4. calculate the change of the Sales week-by-week and put into new column
  5. re-index back to the original index
  6. append that to a results dataframe
  7. repeat with the next store_ID

I feel there's a way to do this all at once as opposed to individually handling each store_ID, but can't seem to find how.

Upvotes: 1

Views: 1942

Answers (1)

Yuca
Yuca

Reputation: 6091

This is the code I use for something similar:

week_freq = 'W-TUE'
temp_df['Sales_change_1_week] = temp_df['Sales'].asfreq(week_freq).diff()

Upvotes: 2

Related Questions