Teddy
Teddy

Reputation: 143

Do a rolling sum of a columns using python

I need to do a rolling sum of a columns of a dataframe, verify if the sum of the rolling (2 to 5 lines) is = 0 and take all the lines from the rolling to flag them as "canceled".

I tried this :

df_sorted = df.sort_values('No Piece FI')
df_sorted['No Piece FI'] = df_sorted['No Piece FI'].astype(str)
df = df_sorted[df_sorted['No Piece FI'].str.startswith('11')]
rolling_sum = df['CP'].rolling(window=5, min_periods=2).sum().shift(-4)
df.loc[(rolling_sum == 0), 'Annulé'] = 'Canceled'

But when I do that the result didn't seems to works. Also I would like to see the lines number like that or at least the group :

Annulé Lines CP Group (optionnal)
Not canceled 8
Not canceled 4
Canceled 1 10 1
Canceled 2 -10 1
Not canceled 3
Canceled 1 15 2
Canceled 2 -5 2
Canceled 3 -10 2
Canceled 1 8 3
Canceled 2 -8 3
Not canceled 19
Not canceled 3
Not canceled 2
Canceled 1 -8 4
Canceled 2 2 4
Canceled 3 2 4
Canceled 4 1 4
Canceled 5 3 4
Not canceled -1024

Here is what only the input would looks like :

CP
8
4
10
-10
15
-5
-10
8
-8
19
3
2
-8
2
2
1
3

How can I do that ? Thanks you.

Upvotes: -1

Views: 99

Answers (2)

Luci
Luci

Reputation: 477

I used your example for the following code. It should help do what you want, except every line has a line number and a group number which makes things way easier. It's more of a partial answer as I haven't tested other inputs.

import pandas as pd

# Using your example dataframe
df = pd.DataFrame({'CP': [10, -10, 15, -5, -10, 8, -8, 19, 3, 2]})

output_df = pd.DataFrame(columns=['Annulé', 'Lines', 'CP', 'Group'])

line_counter = 1
output_line_counter = 0
group_counter = 1

rolling_sum = df['CP'].rolling(window=5, min_periods=2).sum()

for i in range(len(df)):
    if rolling_sum[i] == 0:
        output_df.loc[output_line_counter] = ['Canceled', line_counter, df.loc[i, 'CP'], group_counter]
        output_df.loc[output_df.Group == group_counter, 'Annulé'] = "Canceled"
        group_counter+= 1
        output_line_counter += 1
        line_counter = 1
    else:
        output_df.loc[output_line_counter] = ['', line_counter, df.loc[i, 'CP'], group_counter]
        output_line_counter += 1
        line_counter += 1

print(output_df)

Upvotes: 1

mozway
mozway

Reputation: 260890

If I understand correctly, you can use a cumsum, identify the points in which the cumulated sum is null and backtrack the groups:

m = df['CP'].cumsum().eq(0)[::-1]
g = m.cumsum()

out = (df[m.cummax()] # keep only the rows until the sum cancels last
       .assign(Annulé='Canceled',
               Lines=g[::-1].groupby(g).cumcount().add(1))
      )

Output:

   CP    Annulé  Lines
0  10  Canceled      1
1 -10  Canceled      2
2  15  Canceled      1
3  -5  Canceled      2
4 -10  Canceled      3
5   8  Canceled      1
6  -8  Canceled      2

Upvotes: 0

Related Questions