spartacus8w2039
spartacus8w2039

Reputation: 79

How can I apply forward filling to a subset of data in order to ensure continuous monthly data?

I have a below df where below criteria needs to accomplished,

assume: today's month is Dec 2023

Criteria:

  1. theres a submission in Jan 2023 for A1-sup1, and the next submission is in Apr 2023, since feb and march is missing - need to copy all data from jan as a forward fill to Feb 2023 and March 2023
  2. theres a submission in Apr 2023 for A1-sup1, and no submission until today's month i.e. Dec 2023, so copy all data from Apr as a forward fill to may, june, july, aug, sept, oct, nov, dec
  3. theres a submission in Oct for A2-sup2, and no submission until today's month i.e. Dec 2023, so copy all data from Apr as a forward fill to nov, dec

Input dataframe is below,

import pandas as pd

input_data = [
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Jan-2023","cost": 100},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Feb-2023","cost": 105},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Mar-2023","cost": 108},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 106},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Oct-2023","cost": 101},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Nov-2023","cost": 102},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Dec-2023","cost": 109},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Jan-2024","cost": 104},
]

# Create the DataFrame
input_data = pd.DataFrame(input_data)

# Print the DataFrame (optional)
print(input_data.shape)
input_data

Expected result,

import pandas as pd

output_data = [
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Jan-2023","cost": 100},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Feb-2023","cost": 105},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Mar-2023","cost": 108},
{"submissionmonth": "Jan-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 106},
{"submissionmonth": "Feb-2023", "sku": "A1","location": "sup1","forecastmonth": "Jan-2023","cost": 100},
{"submissionmonth": "Feb-2023", "sku": "A1","location": "sup1","forecastmonth": "Feb-2023","cost": 105},
{"submissionmonth": "Feb-2023", "sku": "A1","location": "sup1","forecastmonth": "Mar-2023","cost": 108},
{"submissionmonth": "Feb-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 106},
{"submissionmonth": "Mar-2023", "sku": "A1","location": "sup1","forecastmonth": "Jan-2023","cost": 100},
{"submissionmonth": "Mar-2023", "sku": "A1","location": "sup1","forecastmonth": "Feb-2023","cost": 105},
{"submissionmonth": "Mar-2023", "sku": "A1","location": "sup1","forecastmonth": "Mar-2023","cost": 108},
{"submissionmonth": "Mar-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 106},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Apr-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "May-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "May-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "May-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "May-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Jun-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Jun-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Jun-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Jun-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Jul-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Jul-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Jul-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Jul-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Aug-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Aug-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Aug-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Aug-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Sept-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Sept-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Sept-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Sept-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Oct-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Oct-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Oct-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Oct-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Oct-2023","cost": 101},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Nov-2023","cost": 102},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Dec-2023","cost": 109},
{"submissionmonth": "Oct-2023", "sku": "A2","location": "sup2","forecastmonth": "Jan-2024","cost": 104},
{"submissionmonth": "Nov-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Nov-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Nov-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Nov-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Nov-2023", "sku": "A2","location": "sup2","forecastmonth": "Oct-2023","cost": 101},
{"submissionmonth": "Nov-2023", "sku": "A2","location": "sup2","forecastmonth": "Nov-2023","cost": 102},
{"submissionmonth": "Nov-2023", "sku": "A2","location": "sup2","forecastmonth": "Dec-2023","cost": 109},
{"submissionmonth": "Nov-2023", "sku": "A2","location": "sup2","forecastmonth": "Jan-2024","cost": 104},
{"submissionmonth": "Dec-2023", "sku": "A1","location": "sup1","forecastmonth": "Apr-2023","cost": 101},
{"submissionmonth": "Dec-2023", "sku": "A1","location": "sup1","forecastmonth": "May-2023","cost": 102},
{"submissionmonth": "Dec-2023", "sku": "A1","location": "sup1","forecastmonth": "Jun-2023","cost": 109},
{"submissionmonth": "Dec-2023", "sku": "A1","location": "sup1","forecastmonth": "Jul-2023","cost": 104},
{"submissionmonth": "Dec-2023", "sku": "A2","location": "sup2","forecastmonth": "Oct-2023","cost": 101},
{"submissionmonth": "Dec-2023", "sku": "A2","location": "sup2","forecastmonth": "Nov-2023","cost": 102},
{"submissionmonth": "Dec-2023", "sku": "A2","location": "sup2","forecastmonth": "Dec-2023","cost": 109},
{"submissionmonth": "Dec-2023", "sku": "A2","location": "sup2","forecastmonth": "Jan-2024","cost": 104},
]

# Create the DataFrame
output_data = pd.DataFrame(output_data)

# Print the DataFrame (optional)
print(output_data.shape)
output_data

Upvotes: -1

Views: 71

Answers (1)

Rawson
Rawson

Reputation: 2787

You can use a combination of .reindex, .stack and .ffill. The data is pivoted such that the submission months are unique per row, so that forward filling works. Then the data is returned back into the form of the input_data.

# convert existing months to datetime
submission_months = pd.to_datetime(
    input_data.submissionmonth,
    format="%b-%Y",
)
# all months between min and max
full_months = pd.date_range(
    submission_months.min(),
    submission_months.max(),
    freq="M"
)


output_data = (
    input_data
    # set the index of the columns
    .set_index(["submissionmonth", "forecastmonth"])
    # unstack, so single-level index of submission month
    .unstack("forecastmonth")
    # reindex with all months, with same formatting
    .reindex(
        full_months.strftime("%b-%Y"),
        axis=0,
    )
    # forward fill data
    .ffill()
    # stack the forecast month back into a column, as before
    .stack("forecastmonth")
    # reset index, providing names
    .reset_index(
        names=["submissionmonth", "forecastmonth"]
    )
)

If you want to keep the same column order as before:

output_data.reindex(columns=input_data.columns, inplace=True)

Upvotes: 0

Related Questions