alexsalo
alexsalo

Reputation: 1495

Dates to Durations in Pandas

I feel like this should be done very easily, yet I can't figure out how. I have a pandas DataFrame with column date:

0    2012-08-21
1    2013-02-17
2    2013-02-18
3    2013-03-03
4    2013-03-04
Name: date, dtype: datetime64[ns]

I want to have a columns of durations, something like:

0    0
1    80 days
2    1 day
3    15 days
4    1 day
Name: date, dtype: datetime64[ns]

My attempt yields bunch of 0 days and NaT instead:

>>> df.date[1:] - df.date[:-1]
0       NaT
1    0 days
2    0 days
...

Any ideas?

Upvotes: 6

Views: 129

Answers (3)

Stefan
Stefan

Reputation: 42875

Timedeltas are useful here: (see docs)

Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.

df

           0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04

You could:

pd.to_timedelta(df)

TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0      0
1    180
2      1
3     13
4      1
Name: 0, dtype: int64

Alternatively, you can calculate the difference between points in time using .shift() (or .diff() as illustrated by @Andy Hayden):

res = df-df.shift()

to get:

res.fillna(0)

         0
0   0 days
1 180 days
2   1 days
3  13 days
4   1 days

You can convert these from timedelta64 dtype to integer using:

res.fillna(0).squeeze().dt.days

0      0
1    180
2      1
3     13
4      1

Upvotes: 6

orange
orange

Reputation: 8068

df.date[1:] - df.date[:-1] doesn't do what you think it does. Each element is subtracted by series/dataframe index mapping, not by location in the series.

Calculating df.date[1:] - df.date[:-1] does:

+---- index of df.date[1:]
|                     +---- index of df.date[:-1]
|                     |
|                     v
v                     
                   -  0    2012-08-21    = NaT
1    2013-02-17    -  1    2013-02-17    = 0
2    2013-02-18    -  2    2013-02-18    = 0
3    2013-03-03    -  3    2013-03-03    = 0
4    2013-03-04    -                     = NaT

Upvotes: 2

Andy Hayden
Andy Hayden

Reputation: 375425

You can use diff:

In [11]: s
Out[11]:
0   2012-08-21
1   2013-02-17
2   2013-02-18
3   2013-03-03
4   2013-03-04
Name: date, dtype: datetime64[ns]

In [12]: s.diff()
Out[12]:
0        NaT
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]

In [13]: s.diff().fillna(0)
Out[13]:
0     0 days
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]

Upvotes: 4

Related Questions