Reputation: 28565
I've been able to successfully calculate the changes over week to week with my data quite fine. However, my data includes thousands of groups that I need to have sorted by. So I am looking for a faster/more efficient way to calculate these week by week changes than how I am currently implementing it.
The way it currently runs, is I have a for loop that does the week to week changes for each subset/store_ID. The calculation works great, but with over 10,000 different items to do this for, takes a rather long time to do. Is there a way to do this by grouping my 'store_ID' column? I've been playing with the .groupby
...but not quite sure how to work with it since it's a groupby object.
Here's my code and how it is working:
I have a dataframe called df
with all my infomation. It was already cleanded and sorted so each store_ID is in ascending order by week. And to keep the idea simple, let's just say I only have these columns:
df[['store_ID', 'Week', 'Sales']]
so....
# Create list of each store
list_of_stores = list(df['store_ID'].unique())
# Create dataframe to dump the results into
results_df = pd.DataFrame()
# Iterate store-by-store to calculate the week to week values
for store in list_of_stores:
# Create a temporary dataframe to do the calculation for the store_ID
temp_df = pd.DataFrame()
temp_df = df[df['store_ID'] == store]
index_list = list(temp_df.index)
temp_df.index = temp_df['Week']
temp_df['Sales_change_1_week']= temp_df['Sales'] -
temp_df['Sales'].shift(1, freq=Week())
temp_df.index = index_list
# Dump the temporary dataframe into a results dataframe
results_df = results_df.append(temp_df)
So at the end, I have the completed results for all the store_IDs, for each week. I do have to note, there are some missing weeks, so in that case, I do have nulls for weeks that could not calculate the change from previous week, and I'm fine with that.
So I take each store_ID:
I feel there's a way to do this all at once as opposed to individually handling each store_ID, but can't seem to find how.
Upvotes: 1
Views: 1942
Reputation: 6091
This is the code I use for something similar:
week_freq = 'W-TUE'
temp_df['Sales_change_1_week] = temp_df['Sales'].asfreq(week_freq).diff()
Upvotes: 2