Mike Henderson
Mike Henderson

Reputation: 2142

Select rows that fall betwen ranges defined in another DataFrame

How can I get the rows of a dataframe that fit between the ranges of another dataframe? For example:

import pandas as pd

df1 = pd.DataFrame({
    'date': [
        pd.Timestamp(2019,1,1),
        pd.Timestamp(2019,1,2),
        pd.Timestamp(2019,1,3),
        pd.Timestamp(2019,2,1),
        pd.Timestamp(2019,2,5)
    ]
})

df2 = pd.DataFrame({
    'from_date': [pd.Timestamp(2019,1,1), pd.Timestamp(2019,2,1)],
    'to_date': [pd.Timestamp(2019,1,2), pd.Timestamp(2019,2,1)]
})

Data:

> df1
    date
0   2019-01-01   <- I want this
1   2019-01-02   <- and this
2   2019-01-03   
3   2019-02-01   <- and this
4   2019-02-05

> df2
    from_date   to_date
0   2019-01-01  2019-01-02
1   2019-02-01  2019-02-01

The ranges can overlap each other. I want find all rows in df1 that fall between any of the ranges in df2. I tried:

df1[df1['date'].between(df2['from_date'], df2['to_date'])]

But that resulted in an error:

ValueError: Can only compare identically-labeled Series objects

Upvotes: 4

Views: 94

Answers (3)

Chris Adams
Chris Adams

Reputation: 18647

Here is another approach:

1) Create an array of dates using list comprehension, numpy.hstack and pandas.date_range.

2) Simple boolean indexing on df1 using this array of dates and Series.isin

# step 1
dates = np.hstack([pd.date_range(s, e) for s, e in zip(df2['from_date'], df2['to_date'])])

# Step 2
df1[df1.date.isin(dates)]

        date
0 2019-01-01
1 2019-01-02
3 2019-02-01

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153510

Another way which is not advised for large dataframes is to create a cartesian product and filter the results:

import pandas as pd

df1 = pd.DataFrame({
    'date': [
        pd.Timestamp(2019,1,1),
        pd.Timestamp(2019,1,2),
        pd.Timestamp(2019,1,3),
        pd.Timestamp(2019,2,1),
        pd.Timestamp(2019,2,5)
    ]
})

df2 = pd.DataFrame({
    'from_date': [pd.Timestamp(2019,1,1), pd.Timestamp(2019,2,1)],
    'to_date': [pd.Timestamp(2019,1,2), pd.Timestamp(2019,2,1)]
})

df1 = df1.apply(pd.to_datetime)

df2 = df2.apply(pd.to_datetime)

df_out = df1.assign(key=1).merge(df2.assign(key=1))\
            .query('from_date <= date <= to_date')

df_out

Output:

        date  key  from_date    to_date
0 2019-01-01    1 2019-01-01 2019-01-02
2 2019-01-02    1 2019-01-01 2019-01-02
7 2019-02-01    1 2019-02-01 2019-02-01

Upvotes: 2

BENY
BENY

Reputation: 323366

I am using numpy broadcast

s2_1=df2.from_date.values
s2_2=df2.to_date.values
s1=df1.values[:,None]
df1[np.any((s1>=s2_1)&(s1<=s2_2),-1)]
Out[35]: 
        date
0 2019-01-01
1 2019-01-02
3 2019-02-01

Upvotes: 3

Related Questions