Michael Dorner
Michael Dorner

Reputation: 20125

Compare a series of intervals with itself

For a series of Interval

s = pd.Series([
    pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-05'), closed='both'), 
    pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-02'), closed='both'), 
    pd.Interval(left=pd.Timestamp('2020-01-04'), right=pd.Timestamp('2020-01-05'), closed='both'), 
])

I want to check for every interval pair - like an outer product - if it is overlapping or not. For that Interval offers the method overlaps().

The result should be a l x l matrix/data frame for a Series of length l containing if the pair overlaps or not. For example:

+--------------------------+--------------------------+--------------------------+--------------------------+
|                          | [2020-01-01, 2020-01-05] | [2020-01-01, 2020-01-02] | [2020-01-04, 2020-01-05] |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-01, 2020-01-05] | True                     | True                     | True                     |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-01, 2020-01-02] | True                     | True                     | False                    |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-04, 2020-01-05] | True                     | False                    | False                    |
+--------------------------+--------------------------+--------------------------+--------------------------+

Because the series is quite large, I'm looking for a more performant and efficient way than itertuples().

Upvotes: 1

Views: 160

Answers (1)

Ben.T
Ben.T

Reputation: 29635

You could use pd.IntervalIndex, to be able to get right and left bounds easily and use numpy ufunc.outer with greater_equal and less_equal.

import numpy as np

#work with IntervalIndex
idx = pd.IntervalIndex(s)
#get right and left bounds
right = idx.right
left = idx.left

#create the boolean of True and False
arr = np.greater_equal.outer(right, left) & np.less_equal.outer(left, right)

#create the dataframe if needed
print (pd.DataFrame(arr, index=s.values, columns=s.values))
                          [2020-01-01, 2020-01-05]  [2020-01-01, 2020-01-02]  \
[2020-01-01, 2020-01-05]                      True                      True   
[2020-01-01, 2020-01-02]                      True                      True   
[2020-01-04, 2020-01-05]                      True                     False   

                          [2020-01-04, 2020-01-05]  
[2020-01-01, 2020-01-05]                      True  
[2020-01-01, 2020-01-02]                     False  
[2020-01-04, 2020-01-05]                      True  

It seems that you could also use overlaps on the IntervalIndex and do something like:

np.stack([idx.overlaps(interval) for interval in idx])
#or for dataframe
pd.DataFrame([idx.overlaps(interval) for interval in idx], 
             index=s.values, columns=s.values)

Upvotes: 2

Related Questions