Reputation: 135
I have a process which produces a dataframe containing a forecasting for products (and versions) in a weekly format (wc/ Monday dates - column names as strings). Example:
product version 2021-06-07 2021-06-14 2021-06-21 2021-06-28
a 1 500 400 300 200
a 2 750 600 450 200
b 1 200 150 100 100
b 2 500 400 300 200
I've been asked to change the forecast into a monthly forecast instead of a weekly one. Example:
product version Jun-21 Jul-21 Aug-21 Sep-21
a 1 350 x x x
a 2 500 x x x
b 1 100 x x x
b 2 350 x x x
Numbers are for show - what I'm trying to do is average the weekly columns (for each row) to create monthly outputs but in an accurate fashion, i.e. if a weekly column was wc/ 26th Feb, then only 3 days worth will be included in the average for February and only 4 days for March.
I know that this is only a question of formatting / bucketing but I am struggling to come up with a solution as I've never had to do something like this before.
Not expecting a full solution but a point in the right direction for how I should approach the task would be much appreciated.
Upvotes: 1
Views: 635
Reputation: 16147
It's a bit of a process, since you need to calculate days in the month, identify which ones flow into the next month, do the math and shift them forward. This should do the trick.
import pandas as pd
import numpy as np
df = pd.DataFrame({'product': ['a', 'a', 'b', 'b'],
'version': [1, 2, 1, 2],
'6/7/2021': [500, 750, 200, 500],
'6/14/2021': [400, 600, 150, 400],
'6/21/2021': [300, 450, 100, 300],
'6/28/2021': [200, 200, 100, 200],
})
# Convert data to long format
df = df.melt(id_vars=['product','version'], var_name='date')
# Convert date to datetime object
df['date'] = pd.to_datetime(df['date'])
# Add 7 days to the day of the month to compare to the number of days in a month
df['month_day'] = df['date'].dt.day + 7
# Get the number of days in the month
df['days_in_month'] = df['date'].dt.daysinmonth
# Subtract to see how many days the current date would extend into the next month
df['overrun'] = df['month_day']-df['days_in_month']
# Calculate the percentage of the values to push forward into the next month
df['push_forward'] = np.where(df['overrun']>0, df['value']/df['days_in_month']*df['overrun'], 0)
# Reduce the current values by the amount to be pushed forward
df['value'] = df['value'] - df['push_forward']
# Copy the records with a push_forward value to a new dataframe
df2 = df.loc[df['push_forward']>0].copy()
# Drop push_foward column
df.drop(columns='push_forward', inplace=True)
# Add a week to the date values of records with a push_foward value
df2['date'] = df2['date']+pd.DateOffset(weeks=1)
# Merge the pushed data back to the original dataframe
df = df.merge(df2[['product','version','date','push_forward']], on=['product','version','date'], how='outer')
# Fill null values
df.fillna(0, inplace=True)
# Add the push forward values to their respective weekly values
df['value'] = df['value'] + df['push_forward']
# Convert date to just the month
df['date'] = df['date'].dt.strftime('%Y-%m')
# Group and take the average
df = df.groupby(['product','version','date'])['value'].mean().reset_index()
# # Create final pivot table
df.pivot_table(index=['product','version'], columns='date', values='value')
Output
date 2021-06 2021-07
product version
a 1 341.666667 33.333333
2 491.666667 33.333333
b 1 133.333333 16.666667
2 341.666667 33.333333
Upvotes: 1
Reputation: 1030
This problem can be solved by melting the Dataframe into long-form (instead of wide-form). In the example below, we translate into long-form, group by the year-month pairs, take the mean, then translate back to wide-form. During the melt
and pivot
operations, some multi-indexes are created, so we have to handle that as well (last line of code).
import pandas as pd
df = pd.DataFrame({
"product": ["a", "a", "b", 'b'],
"version": ["1", "2", "1", '2'],
"2021-06-07": [500, 750, 200, 500],
"2021-06-14": [400, 600, 150, 400],
"2021-06-21": [300, 450, 100, 300],
"2021-06-28": [200, 200, 100, 200],
"2021-07-07": [500, 750, 200, 500],
"2021-07-14": [400, 600, 150, 400],
"2021-07-21": [300, 450, 100, 300],
"2021-07-28": [200, 200, 100, 200],
})
# First, we melt into long-form data
df = df.melt(id_vars=['product', 'version'], var_name='date')
# Truncate the string to only use year-month format
df['date'] = df['date'].apply(lambda x: x[:7])
# Group by product/version/date, then take the mean
df = df.groupby(['product', 'version', 'date']).mean()
# Pivot back to wide-form table
df = df.pivot_table(index=['product', 'version'], columns='date').reset_index()
# Reset column index from multi-index to single string
df.columns = [x[0] if not x[1] else x[1] for x in df.columns]
Upvotes: 0