Reputation: 93
I'm trying to select a range in a dataframe everytime x <= 20 and create a new dataframe with a colum which is the sum of the selected rows (negative value include) of that range and keep the last date as index. Thks for some hints and if I'm on the right way This is the output I want and below my approach which is not working.
Date x Date sum
2019-01-01 100 2019-01-05 343
2019-01-02 120 ---> 2019-01-10 804
2019-01-03 80 2019-01-15 650
2019-01-04 48 2019-01-20 428
2019-01-05 5 ...
2019-01-06 110 ...
2019-01-07 420
2019-01-08 140
2019-01-09 126
2019-01-10 8
2019-01-11 50
2019-01-12 160
2019-01-13 280
2019-01-14 148
2019-01-15 12
2019-01-16 190
2019-01-17 120
2019-01-18 80
2019-01-19 48
2019-01-20 -10
...
...
#######
for date in df.index.to_series().dt.date.unique():
for row in df.itertuples():
for i in row:
if i <= 20:
new_df = pd.DataFrame(columns=df.keys())
new_df.index.name = 'Date'
new_df ['sum'] = df.sum(axis = 0)
continue:
#
Upvotes: 0
Views: 840
Reputation: 26153
Why not use groupby?
df.groupby(df['x'].shift().le(20).cumsum()) \
.agg(Date=('Date','last'), sum = ('x','sum')).set_index('Date')
sum
Date
2019-01-05 353
2019-01-10 804
2019-01-15 650
2019-01-20 428
Upvotes: 2