Keyreall
Keyreall

Reputation: 97

Find overlapped rows in Pandas Data Frame

What is the easiest way to convert the following ascending data frame:

    start   end
0   100 500
1   400 700
2   450 580
3   750 910
4   920 940
5   1000    1200
6   1100    1300

into

    start   end
0   100 700
1   750 910
2   920 940
3   1000    1300

You may notice that rows 0:3 and 5:7 were merged, because these rows overlap or one row is subpart of another: actually, they have only one start and end.

Upvotes: 0

Views: 207

Answers (1)

mozway
mozway

Reputation: 260420

Use a custom group with shift to identify the overlapping intervals and keep the first start and last end (or min/max if you prefer):

group = df['start'].gt(df['end'].shift()).cumsum()

out = df.groupby(group).agg({'start': 'first', 'end': 'last'})

output:

   start   end
0    100   580
1    750   910
2    920   940
3   1000  1300

intermediate group:

0    0
1    0
2    0
3    1
4    2
5    3
6    3
dtype: int64

Upvotes: 1

Related Questions