Reputation: 1961
I'm trying to use two columns start
and stop
to define multiple ranges of values in another dataframe's age
column. Ranges are defined in a df called intervals
:
start stop
1 3
5 7
Ages are defined in another df:
age some_random_value
1 100
2 200
3 300
4 400
5 500
6 600
7 700
8 800
9 900
10 1000
Desired output is values where age
is between the ranges defined in intervals
(1-3 and 5-7):
age some_random_value
1 100
2 200
3 300
5 500
6 600
7 700
I've tried using numpy.r_ but it doesn't work quite as I want it to:
df.age.loc[pd.np.r_[intervals.start, intervals.stop]]
Which yields:
age some_random_value
2 200
6 600
4 400
8 800
Any ideas are much appreciated!
Upvotes: 3
Views: 571
Reputation: 863226
I believe need parameter closed='both'
in IntervalIndex.from_arrays
:
intervals = pd.IntervalIndex.from_arrays(df2['start'], df2['stop'], 'both')
And then select matching values:
df = df[intervals.get_indexer(df.age.values) != -1]
print (df)
age some_random_value
0 1 100
1 2 200
2 3 300
4 5 500
5 6 600
6 7 700
Detail:
print (intervals.get_indexer(df.age.values))
[ 0 0 0 -1 1 1 1 -1 -1 -1]
Upvotes: 4