padrian92
padrian92

Reputation: 147

Select rows where column values are between a given range

How to find and remove rows from DataFrame with values in a specific range, for example dates greater than '2017-03-02' and smaller than '2017-03-05'

import pandas as pd                                     

d_index = pd.date_range('2018-01-01', '2018-01-06')     
d_values = pd.date_range('2017-03-01', '2017-03-06')    

s = pd.Series(d_values)                                 
s = s.rename('values')                                  

df = pd.DataFrame(s)                                    
df = df.set_index(d_index)                              

# remove rows with specific values in 'value' column                              

In example above I have d_values ordered from earliest to the latest date so in this case slicing dataframe by index could do the work. But I am looking for solution that would work also when d_values contain not ordered random date values. Is there any way to do it in pandas?

Upvotes: 2

Views: 4489

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210952

first let's shuffle your DF:

In [65]: df = df.sample(frac=1)

In [66]: df
Out[66]:
               values
2018-01-03 2017-03-03
2018-01-04 2017-03-04
2018-01-01 2017-03-01
2018-01-06 2017-03-06
2018-01-05 2017-03-05
2018-01-02 2017-03-02

you can use DataFrame.eval method (thanks @ cᴏʟᴅsᴘᴇᴇᴅ for the correction!):

In [70]: df[~df.eval("'2017-03-02' < values < '2017-03-05'")]
Out[70]:
               values
2018-01-01 2017-03-01
2018-01-06 2017-03-06
2018-01-05 2017-03-05
2018-01-02 2017-03-02

or DataFrame.query():

In [300]: df.query("not ('2017-03-02' < values < '2017-03-05')")
Out[300]:
               values
2018-01-01 2017-03-01
2018-01-06 2017-03-06
2018-01-05 2017-03-05
2018-01-02 2017-03-02

Upvotes: 2

cs95
cs95

Reputation: 403012

Option 1
pd.Series.between seems suited for this task.

df[~df['values'].between('2017-03-02', '2017-03-05', inclusive=False)]

               values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06

Details
between identifies all items within the range -

m = df['values'].between('2017-03-02', '2017-03-05', inclusive=False)
m

2018-01-01    False
2018-01-02    False
2018-01-03     True
2018-01-04     True
2018-01-05    False
2018-01-06    False
Freq: D, Name: values, dtype: bool

Use the mask to filter on df -

df = df[~m]

Option 2
Alternatively, with the good ol' old logical OR -

df[~(df['values'].gt('2017-03-02') & df['values'].lt('2017-03-05'))]

               values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06

Note that both options work with datetime objects as well as string date columns (in which case, the comparison is lexicographic).

Upvotes: 5

Related Questions