Reputation: 143
I need to do a rolling sum of a columns of a dataframe, verify if the sum of the rolling (2 to 5 lines) is = 0 and take all the lines from the rolling to flag them as "canceled".
I tried this :
df_sorted = df.sort_values('No Piece FI')
df_sorted['No Piece FI'] = df_sorted['No Piece FI'].astype(str)
df = df_sorted[df_sorted['No Piece FI'].str.startswith('11')]
rolling_sum = df['CP'].rolling(window=5, min_periods=2).sum().shift(-4)
df.loc[(rolling_sum == 0), 'Annulé'] = 'Canceled'
But when I do that the result didn't seems to works. Also I would like to see the lines number like that or at least the group :
Annulé | Lines | CP | Group (optionnal) |
---|---|---|---|
Not canceled | 8 | ||
Not canceled | 4 | ||
Canceled | 1 | 10 | 1 |
Canceled | 2 | -10 | 1 |
Not canceled | 3 | ||
Canceled | 1 | 15 | 2 |
Canceled | 2 | -5 | 2 |
Canceled | 3 | -10 | 2 |
Canceled | 1 | 8 | 3 |
Canceled | 2 | -8 | 3 |
Not canceled | 19 | ||
Not canceled | 3 | ||
Not canceled | 2 | ||
Canceled | 1 | -8 | 4 |
Canceled | 2 | 2 | 4 |
Canceled | 3 | 2 | 4 |
Canceled | 4 | 1 | 4 |
Canceled | 5 | 3 | 4 |
Not canceled | -1024 |
Here is what only the input would looks like :
CP |
---|
8 |
4 |
10 |
-10 |
15 |
-5 |
-10 |
8 |
-8 |
19 |
3 |
2 |
-8 |
2 |
2 |
1 |
3 |
How can I do that ? Thanks you.
Upvotes: -1
Views: 99
Reputation: 477
I used your example for the following code. It should help do what you want, except every line has a line number and a group number which makes things way easier. It's more of a partial answer as I haven't tested other inputs.
import pandas as pd
# Using your example dataframe
df = pd.DataFrame({'CP': [10, -10, 15, -5, -10, 8, -8, 19, 3, 2]})
output_df = pd.DataFrame(columns=['Annulé', 'Lines', 'CP', 'Group'])
line_counter = 1
output_line_counter = 0
group_counter = 1
rolling_sum = df['CP'].rolling(window=5, min_periods=2).sum()
for i in range(len(df)):
if rolling_sum[i] == 0:
output_df.loc[output_line_counter] = ['Canceled', line_counter, df.loc[i, 'CP'], group_counter]
output_df.loc[output_df.Group == group_counter, 'Annulé'] = "Canceled"
group_counter+= 1
output_line_counter += 1
line_counter = 1
else:
output_df.loc[output_line_counter] = ['', line_counter, df.loc[i, 'CP'], group_counter]
output_line_counter += 1
line_counter += 1
print(output_df)
Upvotes: 1
Reputation: 260890
If I understand correctly, you can use a cumsum
, identify the points in which the cumulated sum is null and backtrack the groups:
m = df['CP'].cumsum().eq(0)[::-1]
g = m.cumsum()
out = (df[m.cummax()] # keep only the rows until the sum cancels last
.assign(Annulé='Canceled',
Lines=g[::-1].groupby(g).cumcount().add(1))
)
Output:
CP Annulé Lines
0 10 Canceled 1
1 -10 Canceled 2
2 15 Canceled 1
3 -5 Canceled 2
4 -10 Canceled 3
5 8 Canceled 1
6 -8 Canceled 2
Upvotes: 0