Rouzbeh Talebi
Rouzbeh Talebi

Reputation: 23

How to calculate the summation for values based on consecutive days and two other columns

How can I do summation just for consecutive days and for the same name and same supplier? For instance, for A and Supplier Wal, I need to do summation for 2021-05-31 and 2021-06-01 and then do another summation for 2021-06-08 and 2021-06-09. I need to add a new column for summation. Please take a look at the example below:

enter image description here

Here is the Pandas DataFrame code for the table:

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110],
  'Summation': [427,427,660,660,150,150,100,100,340,340,340,380,380,140,140 ]})

Upvotes: 0

Views: 70

Answers (1)

Wilian
Wilian

Reputation: 1257

Like this?

import pandas as pd

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110]})

df['Date'] = pd.to_datetime(df['Date'])
filt = df.loc[((df['Date'] - df['Date'].shift(-1)).abs() == pd.Timedelta('1d')) | (df['Date'].diff() == pd.Timedelta('1d'))]
breaks = filt['Date'].diff() != pd.Timedelta('1d')
df['Summation'] = df.groupby(['Name','Supplier',breaks.cumsum()])['Amount'].transform('sum')

print(df)

output:

   Name Supplier       Date  Amount  Summation
0     A      Wal 2021-05-31      27        427
1     A      Wal 2021-06-01     400        427
2     A      Wal 2021-06-08     410        660
3     A      Wal 2021-06-09     250        660
4     B       Co 2021-05-17     100        150
5     B       Co 2021-05-18      50        150
6     C       Mc 2021-04-07      22        100
7     C       Mc 2021-04-08      78        100
8     C       St 2021-05-11      60        340
9     C       St 2021-05-12     180        340
10    C       St 2021-05-13     100        340
11    C       St 2021-05-18     240        380
12    C       St 2021-05-19     140        380
13    C       To 2021-03-30      30        140
14    C       To 2021-03-31     110        140

Upvotes: 1

Related Questions