Ryan Erwin
Ryan Erwin

Reputation: 817

Conditional Resample - Pandas

I have a situation where I want to calculate daily spend based on a contract start date, end date, and total spend. The next step is to sum up the daily spend amounts by month. This is quite easy, which I'll show I accomplish later. However, there's another date (inv date); if the start is before inv date then all daily amounts prior to inv date should be summed and included in the same period as the inv date.

First, the easy part.

df = pd.DataFrame({'start': ['1/1/2018'], 
               'end': ['3/15/2018'], 
               'inv date': ['2/1/2018'],
               'spend': [400]})


         start        end   inv date    spend
0     1/1/2018  3/15/2018   2/1/2018      400

Create the range of dates for the contract

prd = pd.period_range(df.loc[0, 'start'], df.loc[0, 'end'], freq='D')
prd = pd.Series(1, prd) # empty series to get the number of days in the monthly period
prd = prd.resample('M').size() * (df.loc[0, 'spend'] / prd.resample('M').size().sum())

Which gives me the following series: total spend allocated, by month, based on the number of days in the month...so far, so good.

prd

2018-01    167.567568
2018-02    151.351351
2018-03     81.081081
Freq: M, dtype: float64 

The Hard Part (for me)

As I mentioned above, based on the inv date, the series should actually look like this:

2018-01             0 
2018-02    318.918919
2018-03     81.081081
Freq: M, dtype: float64

Because, the inv date was after the start date, so I don't want to add it until after inv date. FYI: 318 = 167 + 151 (i.e., for the total in 2/2018 above).

Is this possible with resample? If not, what's the most efficient way to accomplish this?

Upvotes: 1

Views: 677

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

Starting with df and prd from the end of your "easy part" section:

# Get the inv date as a pandas Timestamp
invdate = pd.to_datetime(df['inv date'])[0]
oneday = pd.offsets.Day(1)
# Pandas slicing includes BOTH endpoints, so we need this one-day 
# offset to get all values strictly before the inv date
prd.loc[invdate] = prd[invdate] + prd[:invdate - oneday].sum()
prd.loc[:invdate - oneday] = 0

# output
2018-01      0.000000
2018-02    318.918919
2018-03     81.081081
Freq: M, dtype: float64

Upvotes: 1

Related Questions