Ethan
Ethan

Reputation: 534

Compute date difference in days in pandas

I've got a dataframe that looks like this

    date        id
0   2019-01-15  c-15-Jan-2019-0
1   2019-01-26  c-26-Jan-2019-1
2   2019-02-02  c-02-Feb-2019-2
3   2019-02-15  c-15-Feb-2019-3
4   2019-02-23  c-23-Feb-2019-4

and I'd like to create a new column called 'days_since' that shows the number of days that have gone by since the last record. For instance, the new column would be

    date        id              days_since
0   2019-01-15  c-15-Jan-2019-0 NaN
1   2019-01-26  c-26-Jan-2019-1 11
2   2019-02-02  c-02-Feb-2019-2 5
3   2019-02-15  c-15-Feb-2019-3 13
4   2019-02-23  c-23-Feb-2019-4 7

I tried to use

df_c['days_since'] = df_c.groupby('id')['date'].diff().apply(lambda x: x.days)

but that just returned a column full of null values. The date column is full of datetime objects. Any ideas?

Upvotes: 1

Views: 1136

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477759

I think you make it too complicated, given the date column contains datetime data, you can use:

>>> df['date'].diff()
0       NaT
1   11 days
2    7 days
3   13 days
4    8 days
Name: date, dtype: timedelta64[ns]

or if you want the number of days:

>>> df['date'].diff().dt.days
0     NaN
1    11.0
2     7.0
3    13.0
4     8.0
Name: date, dtype: float64

So you can assign the number of days with:

df['days_since'] = df['date'].diff().dt.days

This gives us:

>>> df
        date  days_since
0 2019-01-15         NaN
1 2019-01-26        11.0
2 2019-02-02         7.0
3 2019-02-15        13.0
4 2019-02-23         8.0

Upvotes: 5

Related Questions