Reputation: 1578
sorry I realize the question title is somewhat vague, I wasn't sure how to phrase the question. Hopefully this example will make what I'm trying to do clear.
I have a dataframe like so:
Time Event
0 .1 start
1 .2 end
2 .3 start
3 .4 foo
4 .5 bar
5 .6 end
I am trying to get the indices of the corresponding start and end rows (so in this example, the pairs (0,1) and (2,5)).
I'm not entirely sure how to write a vectorized version of this code. I could do it fairly easily in a loop, but am trying to avoid it.
Assumptions that can be made:
-Every start has an associated end
-Starts and ends can have other events between them
Any help would be appreciated.
Edit: I realized my question missed one key case. I can have an event that starts within another event. So a dataframe like:
Time Event
0 .1 start
1 .2 start
2 .3 end
3 .4 foo
4 .5 bar
5 .6 end
where the results should be (0,2), (1,5)
I realize this completely changes the nature of the question. Sorry for not mentioning it earlier. Would a stack be the easiest way to handle this?
Upvotes: 1
Views: 39
Reputation: 38415
Get the indices in two different lists and join lists to create the tuple
start = df[(df['Event'] == 'start')].index
end = df[(df['Event'] == 'end')].index
[(i1, i2) for i1, i2 in zip(start, end) if i1 < i2]
You get
[(0, 1), (2, 5)]
It will handle any un-even pair of start-end. With your updated dataframe,
Time Event
0 .1 start
1 .2 start
2 .3 end
3 .4 foo
4 .5 bar
5 .6 end
this solution gives
[(0, 2), (1, 5)]
Upvotes: 2
Reputation: 323226
Try this one ?
df['Group']=df.Event.eq('start').cumsum()
df.groupby('Group').agg(['idxmin','idxmax'])
Out[797]:
Time
idxmin idxmax
Group
1 0 1
2 2 5
Or you can have a list output
df.groupby('Group').agg(['idxmin','idxmax']).values.tolist()
Out[800]: [[0, 1], [2, 5]]
Upvotes: 2