JDai
JDai

Reputation: 13

Python pandas apply date_range onto two columns

My dataframe looks like this:

df[['reported_date', 'current_date']].head()
    reported_date        current_date
0   2016-01-15 13:58:21  2016-01-18 00:00:00
1   2016-01-14 10:51:24  2016-01-18 00:00:00
2   2016-01-15 15:17:35  2016-01-18 00:00:00
3   2016-01-17 17:07:10  2016-01-18 00:00:00
4   2016-01-17 17:08:23  2016-01-18 00:00:00

I can apply date subtraction directly like:

df[['reported_date', 'current_date']].head().apply(lambda x: x[1]-x[0], axis=1)

but when I tried to apply date_range to get the interval between the days I got the following error

df[['reported_date', 'current_date']].head().apply(lambda x: pd.date_range(x[0], x[1], freq='B'), axis=1)

"ValueError: Length of values does not match length of index"

So what's the right way to apply date_range() to two columns of datetime ?

Thank you in advance.

jian

Upvotes: 1

Views: 1701

Answers (1)

user707650
user707650

Reputation:

pd.date_range doesn't return an interval. It returns a series (DateTimeIndex really) of all datetime objects between start and end. Since start is reported_date here and is variable, while the end is current_date and is fixed, you get series of different lengths, which obviously don't fit nicely into a single (new) column.

The subtraction you use before gives you the interval between the dates. So there is no reason to use pd.date_range: x[1] - x[0] does exactly what you want.

Upvotes: 3

Related Questions