Nils Gudat
Nils Gudat

Reputation: 13800

numpy datetime and pandas datetime

I'm confused by the interoperation between numpy and pandas date objects (or maybe just by numpy's datetime64 in general).

I was trying to count business days using numpy's built-in functionality like so:

np.busday_count("2016-03-01", "2016-03-31", holidays=[np.datetime64("28/03/2016")])

However, numpy apparently can't deal with the inverted date format:

ValueError: Error parsing datetime string "28/03/2016" at position 2

To get around this, I thought I'd just use pandas to_datetime, which can. However:

np.busday_count("2016-03-01", "2016-03-31", holidays=[np.datetime64(pd.to_datetime("28/03/2016"))])

ValueError: Cannot safely convert provided holidays input into an array of dates

Searching around for a bit, it seemed that this was caused by the fact that the chaining of to_datetime and np.datetime64 results in a datetime64[us] object, which apparently the busday_count function cannot accept (is this intended behaviour or a bug?). Thus, my next attempt was:

np.busday_count("2016-03-01", "2016-03-31", holidays=[np.datetime64(pd.Timestamp("28"), "D")])

But:

TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind'

And that's me out - why are there so many incompatibilities between all these datetime formats? And how can I get around them?

Upvotes: 6

Views: 7578

Answers (2)

bearcub
bearcub

Reputation: 332

I've been having a similar issue, using np.is_busday()

The type of datetime64 is vital to get right. Checking the numpy datetime docs, you can specify the numpy datetime type to be D.

This works:

my_holidays=np.array([datetime.datetime.strptime(x,'%m/%d/%y') for x in holidays.Date.values], dtype='datetime64[D]')

day_flags['business_day'] = np.is_busday(days,holidays=my_holidays)

Whereas this throws the same error you got:

my_holidays=np.array([datetime.datetime.strptime(x,'%m/%d/%y') for x in holidays.Date.values], dtype='datetime64')

The only difference is specifying the type of datetime64.

dtype='datetime64[D]'

vs

dtype='datetime64'

Docs are here:

https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.datetime.html

Upvotes: 4

Jeril
Jeril

Reputation: 8521

I had the same issue while using np.busday_count, later I figured out the problem was with the hours, minutes, seconds, and milliseconds getting added while converting it to datetime object or numpy datetime object.

I just converted to datetime object with only date and not the hours, minutes, seconds, and milliseconds.

The following was my code:

holidays_list.json file:

{
    "holidays_2019": [
        "04-Mar-2019",
        "21-Mar-2019",
        "17-Apr-2019",
        "19-Apr-2019",
        "29-Apr-2019",
        "01-May-2019",
        "05-Jun-2019",
        "12-Aug-2019",
        "15-Aug-2019",
        "02-Sep-2019",
        "10-Sep-2019",
        "02-Oct-2019",
        "08-Oct-2019",
        "28-Oct-2019",
        "12-Nov-2019",
        "25-Dec-2019"
    ],
    "format": "%d-%b-%Y"
}

code file:

import json
import datetime
import numpy as np

with open('holidays_list.json', 'r') as infile:
    data = json.loads(infile.read())

# the following is where I convert the datetime object to date
holidays = list(map(lambda x: datetime.datetime.strptime(
    x, data['format']).date(), data['holidays_2019']))

start_date = datetime.datetime.today().date()
end_date = start_date + datetime.timedelta(days=30)
holidays = [start_date + datetime.timedelta(days=1)]
print(np.busday_count(start_date, end_date, holidays=holidays))

Upvotes: 0

Related Questions