Reputation: 195
I have a dataframe :
Type: Volume: Date:
Q 10 2016.6.1
Q 20 2016.6.1
T 10 2016.6.2
Q 10 2016.6.3
T 20 2016.6.4
T 20 2016.6.5
Q 10 2016.6.6
Note that the date for the two consecutive T's are different, and I want to take the first date
and I want to combine type T to one row and add up volume only if two(or more) Ts are consecutive
i.e. to :
Q 10 2016.6.1
Q 20 2016.6.1
T 10 2016.6.2
Q 10 2016.6.3
T 20+20=40 2016.6.4
Q 10 2016.6.6
The code im using right now is:
df.groupby(by = [df.Type.ne('T').cumsum(),'Price', 'Time', 'Type'], as_index = False)['Volume'].sum()
However, this code only works when the date of the consecutive Ts are the same. Do you know how to combine consecutive T with different date, and only take the first date?
Upvotes: 0
Views: 349
Reputation: 59701
import numpy as np
import pandas as pd
df = pd.DataFrame({"Type": ["Q", "Q", "T", "Q", "T", "T", "Q"],
"Volume": [10, 20, 10, 10, 20, 20, 10],
"Date": ["2016-06-01", "2016-06-01", "2016-06-02", "2016-06-03",
"2016-06-04", "2016-06-05", "2016-06-06"]})
df["Date"] = pd.to_datetime(df["Date"])
res = df.groupby(by = [df.Type.ne('T').cumsum(), 'Type'], as_index=False).agg({'Volume': 'sum', 'Date': 'first'})
print(res)
Output:
Type Date Volume
0 Q 2016-06-01 10
1 Q 2016-06-01 20
2 T 2016-06-02 10
3 Q 2016-06-03 10
4 T 2016-06-04 40
5 Q 2016-06-06 10
Upvotes: 1