Reputation: 895
I have a sequentially ordered dataframe that represent two events measured over time - the measurements are the start and end times of the event. They should be ordered in an ABABAB sequence, but in some cases I may have consecutive events of the same type (i.e. ABABAABABB). I am looking for a way to check the event label (A or B) in each row with the previous event label, and if they are the same to merge the rows in such a way that I maintain the start time of the first event and the end time of the second event. Consider the following:
myDF = pd.DataFrame({"Event": ["A","B","A","A","B","B","A"],
"Start": [1,3,5,7,9,11,13],
"End": [2,4,6,8,10,12,14]})
What I currently have...
==============================
Event Start End
==============================
A 1 2
B 3 4
A 5 6
A 7 8
B 9 10
B 11 12
A 13 14
==============================
What I need...
Note: The two A events at index position 2-3 have been merged into one, as have the two B events originally at positions 4-5.
==============================
Event Start End
==============================
A 1 2
B 3 4
A 5 8
B 9 12
A 13 14
==============================
I had initially thought to use groupby
but I don't think this right as this will group over the entire dataframe. Similarly I have tried using iteritems
but have not had any success. Apologies for the lack of code but I'm at a loss as to how to approach the problem.
Upvotes: 0
Views: 221
Reputation: 61
Another way can be
for i in range(1,myDF.shape[0]):
if myDF['Event'][i] == myDF['Event'][i-1]:
myDF.loc[i, ('Start')]= min(myDF['Start'][i],myDF['Start'][i-1])
myDF.loc[i, ('End')]= max(myDF['End'][i],myDF['End'][i-1])
myDF.drop([i-1],inplace=True)
Upvotes: 1
Reputation: 20689
You can use GroupBy.agg
with first
and last
.
g = df["Event"].ne(df["Event"].shift()).cumsum()
df.groupby(g, as_index = False).agg({
"Event": "first",
"Start": "first",
"End": "last"
})
Event Start End
0 A 1 2
1 B 3 4
2 A 5 8
3 B 9 12
4 A 13 14
Upvotes: 1