Plasma
Plasma

Reputation: 1961

Pandas: Find values within multiple ranges defined by start- and stop-columns

I'm trying to use two columns start and stop to define multiple ranges of values in another dataframe's age column. Ranges are defined in a df called intervals:

start  stop
    1     3
    5     7

Ages are defined in another df:

age  some_random_value
  1                100
  2                200
  3                300
  4                400
  5                500
  6                600
  7                700
  8                800
  9                900
 10               1000

Desired output is values where age is between the ranges defined in intervals (1-3 and 5-7):

age  some_random_value
  1                100
  2                200
  3                300
  5                500
  6                600
  7                700

I've tried using numpy.r_ but it doesn't work quite as I want it to:

df.age.loc[pd.np.r_[intervals.start, intervals.stop]]

Which yields:

age  some_random_value
  2                200
  6                600
  4                400
  8                800

Any ideas are much appreciated!

Upvotes: 3

Views: 571

Answers (1)

jezrael
jezrael

Reputation: 863226

I believe need parameter closed='both' in IntervalIndex.from_arrays:

intervals = pd.IntervalIndex.from_arrays(df2['start'], df2['stop'], 'both')

And then select matching values:

df = df[intervals.get_indexer(df.age.values) != -1]
print (df)
   age  some_random_value
0    1                100
1    2                200
2    3                300
4    5                500
5    6                600
6    7                700

Detail:

print (intervals.get_indexer(df.age.values))
[ 0  0  0 -1  1  1  1 -1 -1 -1]

Upvotes: 4

Related Questions