Reputation: 599
Find first interval containing a given number in a DataFrame.
The DataFrame with each row representing an interval <start, stop>
:
start stop
0 1 4
1 2 7
2 4 10
3 4 12
4 5 14
5 6 10
If I were looking for 8
, I should get row no. 2, because this is the first interval to contain 8 (start <= 8 <= stop
).
start stop
0 1 4
1 2 7
2 4 10 # <--- the first one to contain 8
3 4 12
4 5 14
5 6 10
I was trying with np.searchsorted
, but I can only search for start
value... I don't want to use some .iterrows()
stuff, because performance is crucial.
Upvotes: 0
Views: 100
Reputation: 42916
We can use Series.idxmax
on our boolean series:
(df["start"].le(8) & df["stop"].ge(8)).idxmax()
# out: 2
Upvotes: 1
Reputation: 120479
Use np.argmax
>>> np.argmax((df['start'].values <= 8) & (8 <= df['stop'].values))
2
Performance
%timeit np.argmax((df['start'].values <= 8) & (8 <= df['stop'].values))
9.31 µs ± 559 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit (df["start"].le(8) & df["stop"].ge(8)).idxmax()
382 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 2