mrsquid
mrsquid

Reputation: 635

Python Pandas Find all weekends/dates between two date columns

Say I have a data frame with two columns

Start      End
1/1/2015  1/5/2015
1/10/2015 1/12/2015

What would be the best method to get the dates in between the start and end (the actual dates, not the number of days in between) including the start and end dates.

For example I would get 1/1/2015, 1/2/2015, ..., 1/5/2015.

The reason I want to do this is to find out how many weekend are between two dates.

Here is the example data frame plus the code to parse dates quickly.

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.map(dates)

df = pd.DataFrame({"Start": ["1/1/2015", "1/10/2015"], "End": ["1/5/2015", "1/12/2015"]})
df["Start"] = lookup(df["Start"])
df["End"] = lookup(df["End"])

If someone knows a better way to do this please let me know as I think there is a better way to find the number of weekends between two dates.

I have tried to understand the pd.date_range() function and have tried applying it like this.

df["between"] = pd.date_range(df["Start"], df["End"])

But get an error saying it cannot convert the input, I know I am using this function incorrectly. I think I need to use apply but am unsure how to use it with this function.

Any help is appreciated. Please let me know if you need any more information.

Thank you for your time.

Upvotes: 2

Views: 2842

Answers (1)

MattR
MattR

Reputation: 5126

you can leverage the built-in dataoffsets that pandas uses. bdate_range() is going to be your friend here

# create a dataframe of dates
df = pd.DataFrame({'Dates': pd.date_range("2015-01-01", "2019-08-01")})

# create a series of business days
busines_dates = pd.bdate_range("2015-01-01", "2019-08-30")

# find where the two do not intersect
df.loc[~df['Dates'].isin(busines_dates)]

I feel like you may want this as a funciton based on your question. Here is a basic one:

def weekends(start, end):
    df = pd.DataFrame({'Dates': pd.date_range(start, end)})
    busines_dates = pd.bdate_range(start, end)
    answer = df.loc[~df['Dates'].isin(busines_dates)]
    print("There are", answer.shape[0], 'weekends between', start, 'and', end)
    return answer

weekends("2015-01-01", "2019-01-01")


There are 418 weekends between 2015-01-01 and 2019-01-01
          Dates
2    2015-01-03
3    2015-01-04
9    2015-01-10
10   2015-01-11
16   2015-01-17
...         ...
1445 2018-12-16
1451 2018-12-22
1452 2018-12-23
1458 2018-12-29
1459 2018-12-30

Upvotes: 4

Related Questions