How to combine consecutive data in a dataframe and add up value

Question

I have a dataframe :

 Type:  Volume: Date:
 Q     10      2016.6.1
 Q     20      2016.6.1 
 T     10      2016.6.2 
 Q     10      2016.6.3
 T     20      2016.6.4
 T     20      2016.6.5
 Q     10      2016.6.6

Note that the date for the two consecutive T's are different, and I want to take the first date

and I want to combine type T to one row and add up volume only if two(or more) Ts are consecutive

i.e. to :

 Q     10      2016.6.1
 Q     20      2016.6.1 
 T     10      2016.6.2 
 Q     10      2016.6.3
 T     20+20=40 2016.6.4
 Q     10      2016.6.6

The code im using right now is:

df.groupby(by = [df.Type.ne('T').cumsum(),'Price', 'Time', 'Type'], as_index = False)['Volume'].sum()

However, this code only works when the date of the consecutive Ts are the same. Do you know how to combine consecutive T with different date, and only take the first date?

javidcf · Accepted Answer

import numpy as np
import pandas as pd

df = pd.DataFrame({"Type":   ["Q", "Q", "T", "Q", "T", "T", "Q"],
                   "Volume": [10,   20,  10,  10,  20,  20,  10],
                   "Date":   ["2016-06-01", "2016-06-01", "2016-06-02", "2016-06-03",
                              "2016-06-04", "2016-06-05", "2016-06-06"]})
df["Date"] = pd.to_datetime(df["Date"])

res = df.groupby(by = [df.Type.ne('T').cumsum(), 'Type'], as_index=False).agg({'Volume': 'sum', 'Date': 'first'})
print(res)

Output:

  Type       Date  Volume
0    Q 2016-06-01      10
1    Q 2016-06-01      20
2    T 2016-06-02      10
3    Q 2016-06-03      10
4    T 2016-06-04      40
5    Q 2016-06-06      10

How to combine consecutive data in a dataframe and add up value

Answers (1)

Related Questions