Rafaó
Rafaó

Reputation: 599

Find first interval containing a given number in a DataFrame

Find first interval containing a given number in a DataFrame.

The DataFrame with each row representing an interval <start, stop>:

   start  stop
0      1     4
1      2     7
2      4    10
3      4    12
4      5    14
5      6    10

If I were looking for 8, I should get row no. 2, because this is the first interval to contain 8 (start <= 8 <= stop).

   start  stop
0      1     4
1      2     7
2      4    10   # <--- the first one to contain 8
3      4    12
4      5    14
5      6    10

I was trying with np.searchsorted, but I can only search for start value... I don't want to use some .iterrows() stuff, because performance is crucial.

Upvotes: 0

Views: 100

Answers (2)

Erfan
Erfan

Reputation: 42916

We can use Series.idxmax on our boolean series:

(df["start"].le(8) & df["stop"].ge(8)).idxmax()

# out: 2

Upvotes: 1

Corralien
Corralien

Reputation: 120479

Use np.argmax

>>> np.argmax((df['start'].values <= 8) & (8 <= df['stop'].values))
2

Performance

%timeit np.argmax((df['start'].values <= 8) & (8 <= df['stop'].values))
9.31 µs ± 559 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit (df["start"].le(8) & df["stop"].ge(8)).idxmax()
382 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 2

Related Questions