Saedeas
Saedeas

Reputation: 1578

How to find multiple row pairings (with potential rows between the members of a pairing)

sorry I realize the question title is somewhat vague, I wasn't sure how to phrase the question. Hopefully this example will make what I'm trying to do clear.

I have a dataframe like so:

    Time    Event
0    .1      start
1    .2      end
2    .3      start
3    .4      foo
4    .5      bar
5    .6      end

I am trying to get the indices of the corresponding start and end rows (so in this example, the pairs (0,1) and (2,5)).

I'm not entirely sure how to write a vectorized version of this code. I could do it fairly easily in a loop, but am trying to avoid it.

Assumptions that can be made:

-Every start has an associated end

-Starts and ends can have other events between them

Any help would be appreciated.

Edit: I realized my question missed one key case. I can have an event that starts within another event. So a dataframe like:

    Time    Event
0    .1      start
1    .2      start
2    .3      end
3    .4      foo
4    .5      bar
5    .6      end

where the results should be (0,2), (1,5)

I realize this completely changes the nature of the question. Sorry for not mentioning it earlier. Would a stack be the easiest way to handle this?

Upvotes: 1

Views: 39

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Get the indices in two different lists and join lists to create the tuple

start = df[(df['Event'] == 'start')].index
end = df[(df['Event'] == 'end')].index
[(i1, i2) for i1, i2 in zip(start, end) if i1 < i2]

You get

[(0, 1), (2, 5)]

It will handle any un-even pair of start-end. With your updated dataframe,

    Time    Event
0    .1      start
1    .2      start
2    .3      end
3    .4      foo
4    .5      bar
5    .6      end

this solution gives

[(0, 2), (1, 5)]

Upvotes: 2

BENY
BENY

Reputation: 323226

Try this one ?

df['Group']=df.Event.eq('start').cumsum()
df.groupby('Group').agg(['idxmin','idxmax'])
Out[797]: 
        Time       
      idxmin idxmax
Group              
1          0      1
2          2      5

Or you can have a list output

df.groupby('Group').agg(['idxmin','idxmax']).values.tolist()
Out[800]: [[0, 1], [2, 5]]

Upvotes: 2

Related Questions