How to find multiple row pairings (with potential rows between the members of a pairing)

Question

sorry I realize the question title is somewhat vague, I wasn't sure how to phrase the question. Hopefully this example will make what I'm trying to do clear.

I have a dataframe like so:

    Time    Event
0    .1      start
1    .2      end
2    .3      start
3    .4      foo
4    .5      bar
5    .6      end

I am trying to get the indices of the corresponding start and end rows (so in this example, the pairs (0,1) and (2,5)).

I'm not entirely sure how to write a vectorized version of this code. I could do it fairly easily in a loop, but am trying to avoid it.

Assumptions that can be made:

-Every start has an associated end

-Starts and ends can have other events between them

Any help would be appreciated.

Edit: I realized my question missed one key case. I can have an event that starts within another event. So a dataframe like:

    Time    Event
0    .1      start
1    .2      start
2    .3      end
3    .4      foo
4    .5      bar
5    .6      end

where the results should be (0,2), (1,5)

I realize this completely changes the nature of the question. Sorry for not mentioning it earlier. Would a stack be the easiest way to handle this?

Vaishali · Accepted Answer

Get the indices in two different lists and join lists to create the tuple

start = df[(df['Event'] == 'start')].index
end = df[(df['Event'] == 'end')].index
[(i1, i2) for i1, i2 in zip(start, end) if i1 < i2]

You get

[(0, 1), (2, 5)]

It will handle any un-even pair of start-end. With your updated dataframe,

    Time    Event
0    .1      start
1    .2      start
2    .3      end
3    .4      foo
4    .5      bar
5    .6      end

this solution gives

[(0, 2), (1, 5)]

How to find multiple row pairings (with potential rows between the members of a pairing)

Answers (2)

Related Questions