Reputation: 11110
I'm wondering if there's a succinct, broadcast way to take a series/frame such as this:
>>> print(pd.DataFrame({'a': [5, 10, 9, 11, 13, 14, 12]}, index=pd.date_range('2020-12-01', periods=7)))
a
2020-12-01 5
2020-12-02 10
2020-12-03 9
2020-12-04 11
2020-12-05 13
2020-12-06 14
2020-12-07 12
...and turn it into:
>>> print(pd.DataFrame({'a': [5, 9, 9, 11, 12, 12, 12]}, index=pd.date_range('2020-12-01', periods=7)))
a
2020-12-01 5
2020-12-02 9
2020-12-03 9
2020-12-04 11
2020-12-05 12
2020-12-06 12
2020-12-07 12
NB: The important part is that the most recent value is kept, and and previous values that exceed it are replaced with the current value goes backwards in time, resulting in a monotonically increasing series of numbers such that any modifications do not increase a modified value.
The actual data is hierarchical and has multiple columns, but I wanted to start with something I could get my head around:
any full
date areaCode
2020-12-08 E92000001 0.0 0.0
N92000002 0.0 0.0
S92000003 0.0 0.0
W92000004 0.0 0.0
2020-12-09 E92000001 11115.2 0.0
N92000002 724.6 0.0
S92000003 3801.8 0.0
W92000004 1651.4 0.0
...
2021-01-24 E92000001 5727693.0 441684.0
N92000002 159642.0 22713.0
S92000003 415402.0 5538.0
W92000004 270833.0 543.0
2021-01-25 E92000001 5962544.0 443010.0
Upvotes: 1
Views: 66
Reputation: 153500
Here's another way:
df.sort_index(ascending=False).cummin().sort_index()
a
2020-12-01 5
2020-12-02 9
2020-12-03 9
2020-12-04 11
2020-12-05 12
2020-12-06 12
2020-12-07 12
For the MultiIndex, this becomes:
df.sort_index(ascending=False).groupby('areaCode').cummin().sort_index()
Upvotes: 3
Reputation: 8219
Flipping the time axis and doing a rolling min does the trick:
df.sort_index(ascending = False).rolling(window=1000, min_periods=1).min().sort_index()
produces
a
2020-12-01 5.0
2020-12-02 9.0
2020-12-03 9.0
2020-12-04 11.0
2020-12-05 12.0
2020-12-06 12.0
2020-12-07 12.0
Upvotes: 1