Reputation: 20125
For a series of Interval
s = pd.Series([
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-05'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-02'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-04'), right=pd.Timestamp('2020-01-05'), closed='both'),
])
I want to check for every interval pair - like an outer product - if it is overlapping or not. For that Interval
offers the method overlaps()
.
The result should be a l x l
matrix/data frame for a Series of length l
containing if the pair overlaps or not. For example:
+--------------------------+--------------------------+--------------------------+--------------------------+ | | [2020-01-01, 2020-01-05] | [2020-01-01, 2020-01-02] | [2020-01-04, 2020-01-05] | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-01, 2020-01-05] | True | True | True | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-01, 2020-01-02] | True | True | False | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-04, 2020-01-05] | True | False | False | +--------------------------+--------------------------+--------------------------+--------------------------+
Because the series is quite large, I'm looking for a more performant and efficient way than itertuples()
.
Upvotes: 1
Views: 160
Reputation: 29635
You could use pd.IntervalIndex
, to be able to get right
and left
bounds easily and use numpy ufunc.outer
with greater_equal
and less_equal
.
import numpy as np
#work with IntervalIndex
idx = pd.IntervalIndex(s)
#get right and left bounds
right = idx.right
left = idx.left
#create the boolean of True and False
arr = np.greater_equal.outer(right, left) & np.less_equal.outer(left, right)
#create the dataframe if needed
print (pd.DataFrame(arr, index=s.values, columns=s.values))
[2020-01-01, 2020-01-05] [2020-01-01, 2020-01-02] \
[2020-01-01, 2020-01-05] True True
[2020-01-01, 2020-01-02] True True
[2020-01-04, 2020-01-05] True False
[2020-01-04, 2020-01-05]
[2020-01-01, 2020-01-05] True
[2020-01-01, 2020-01-02] False
[2020-01-04, 2020-01-05] True
It seems that you could also use overlaps
on the IntervalIndex and do something like:
np.stack([idx.overlaps(interval) for interval in idx])
#or for dataframe
pd.DataFrame([idx.overlaps(interval) for interval in idx],
index=s.values, columns=s.values)
Upvotes: 2