Chris Withers
Chris Withers

Reputation: 11110

pandas broadcast function to ensure monotonically decreasing series

I'm wondering if there's a succinct, broadcast way to take a series/frame such as this:

>>> print(pd.DataFrame({'a': [5, 10, 9, 11, 13, 14, 12]}, index=pd.date_range('2020-12-01', periods=7)))
             a
2020-12-01   5
2020-12-02  10
2020-12-03   9
2020-12-04  11
2020-12-05  13
2020-12-06  14
2020-12-07  12

...and turn it into:

>>> print(pd.DataFrame({'a': [5, 9, 9, 11, 12, 12, 12]}, index=pd.date_range('2020-12-01', periods=7)))
             a
2020-12-01   5
2020-12-02   9
2020-12-03   9
2020-12-04  11
2020-12-05  12
2020-12-06  12
2020-12-07  12

NB: The important part is that the most recent value is kept, and and previous values that exceed it are replaced with the current value goes backwards in time, resulting in a monotonically increasing series of numbers such that any modifications do not increase a modified value.

The actual data is hierarchical and has multiple columns, but I wanted to start with something I could get my head around:

                            any      full
date       areaCode                      
2020-12-08 E92000001        0.0       0.0
           N92000002        0.0       0.0
           S92000003        0.0       0.0
           W92000004        0.0       0.0
2020-12-09 E92000001    11115.2       0.0
           N92000002      724.6       0.0
           S92000003     3801.8       0.0
           W92000004     1651.4       0.0
...
2021-01-24 E92000001  5727693.0  441684.0
           N92000002   159642.0   22713.0
           S92000003   415402.0    5538.0
           W92000004   270833.0     543.0
2021-01-25 E92000001  5962544.0  443010.0

Upvotes: 1

Views: 66

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

Here's another way:

df.sort_index(ascending=False).cummin().sort_index()

             a
2020-12-01   5
2020-12-02   9
2020-12-03   9
2020-12-04  11
2020-12-05  12
2020-12-06  12
2020-12-07  12

For the MultiIndex, this becomes:

df.sort_index(ascending=False).groupby('areaCode').cummin().sort_index()

Upvotes: 3

piterbarg
piterbarg

Reputation: 8219

Flipping the time axis and doing a rolling min does the trick:

df.sort_index(ascending = False).rolling(window=1000, min_periods=1).min().sort_index()

produces


            a
2020-12-01  5.0
2020-12-02  9.0
2020-12-03  9.0
2020-12-04  11.0
2020-12-05  12.0
2020-12-06  12.0
2020-12-07  12.0

Upvotes: 1

Related Questions