SlowlyLearning
SlowlyLearning

Reputation: 135

Converting a weekly forecast (Pandas df) into monthly format

I have a process which produces a dataframe containing a forecasting for products (and versions) in a weekly format (wc/ Monday dates - column names as strings). Example:

product     version     2021-06-07     2021-06-14     2021-06-21     2021-06-28

   a           1           500            400            300            200

   a           2           750            600            450            200

   b           1           200            150            100            100

   b           2           500            400            300            200

I've been asked to change the forecast into a monthly forecast instead of a weekly one. Example:

product     version       Jun-21         Jul-21         Aug-21         Sep-21

   a           1           350             x              x              x

   a           2           500             x              x              x
 
   b           1           100             x              x              x

   b           2           350             x              x              x

Numbers are for show - what I'm trying to do is average the weekly columns (for each row) to create monthly outputs but in an accurate fashion, i.e. if a weekly column was wc/ 26th Feb, then only 3 days worth will be included in the average for February and only 4 days for March.

I know that this is only a question of formatting / bucketing but I am struggling to come up with a solution as I've never had to do something like this before.

Not expecting a full solution but a point in the right direction for how I should approach the task would be much appreciated.

Upvotes: 1

Views: 635

Answers (2)

Chris
Chris

Reputation: 16147

It's a bit of a process, since you need to calculate days in the month, identify which ones flow into the next month, do the math and shift them forward. This should do the trick.

import pandas as pd
import numpy as np

df = pd.DataFrame({'product': ['a', 'a', 'b', 'b'],
 'version': [1, 2, 1, 2],
 '6/7/2021': [500, 750, 200, 500],
 '6/14/2021': [400, 600, 150, 400],
 '6/21/2021': [300, 450, 100, 300],
 '6/28/2021': [200, 200, 100, 200],
 })

# Convert data to long format
df = df.melt(id_vars=['product','version'], var_name='date')
# Convert date to datetime object
df['date'] = pd.to_datetime(df['date'])

# Add 7 days to the day of the month to compare to the number of days in a month
df['month_day'] = df['date'].dt.day + 7

# Get the number of days in the month
df['days_in_month'] = df['date'].dt.daysinmonth

# Subtract to see how many days the current date would extend into the next month
df['overrun'] = df['month_day']-df['days_in_month']

# Calculate the percentage of the values to push forward into the next month
df['push_forward'] = np.where(df['overrun']>0, df['value']/df['days_in_month']*df['overrun'], 0)

# Reduce the current values by the amount to be pushed forward
df['value'] = df['value'] - df['push_forward']

# Copy the records with a push_forward value to a new dataframe
df2 = df.loc[df['push_forward']>0].copy()

# Drop push_foward column
df.drop(columns='push_forward', inplace=True)

# Add a week to the date values of records with a push_foward value
df2['date'] = df2['date']+pd.DateOffset(weeks=1)

# Merge the pushed data back to the original dataframe
df = df.merge(df2[['product','version','date','push_forward']], on=['product','version','date'], how='outer')

# Fill null values
df.fillna(0, inplace=True)

# Add the push forward values to their respective weekly values
df['value'] = df['value'] + df['push_forward']

# Convert date to just the month
df['date'] = df['date'].dt.strftime('%Y-%m')

# Group and take the average
df = df.groupby(['product','version','date'])['value'].mean().reset_index()


# # Create final pivot table
df.pivot_table(index=['product','version'], columns='date', values='value')

Output

            date       2021-06    2021-07
product version     
      a        1    341.666667  33.333333
               2    491.666667  33.333333
      b        1    133.333333  16.666667
               2    341.666667  33.333333

Upvotes: 1

grey_ranger
grey_ranger

Reputation: 1030

This problem can be solved by melting the Dataframe into long-form (instead of wide-form). In the example below, we translate into long-form, group by the year-month pairs, take the mean, then translate back to wide-form. During the melt and pivot operations, some multi-indexes are created, so we have to handle that as well (last line of code).

import pandas as pd

df = pd.DataFrame({
    "product": ["a", "a", "b", 'b'],
    "version": ["1", "2", "1", '2'],
    "2021-06-07": [500, 750, 200, 500],
    "2021-06-14": [400, 600, 150, 400],
    "2021-06-21": [300, 450, 100, 300],
    "2021-06-28": [200, 200, 100, 200],
    "2021-07-07": [500, 750, 200, 500],
    "2021-07-14": [400, 600, 150, 400],
    "2021-07-21": [300, 450, 100, 300],
    "2021-07-28": [200, 200, 100, 200],
})

# First, we melt into long-form data
df = df.melt(id_vars=['product', 'version'], var_name='date')

# Truncate the string to only use year-month format
df['date'] = df['date'].apply(lambda x: x[:7])

# Group by product/version/date, then take the mean
df = df.groupby(['product', 'version', 'date']).mean()

# Pivot back to wide-form table
df = df.pivot_table(index=['product', 'version'], columns='date').reset_index()

# Reset column index from multi-index to single string
df.columns = [x[0] if not x[1] else x[1] for x in df.columns]

Upvotes: 0

Related Questions