lin_bug
lin_bug

Reputation: 1155

pandas error creating TimeDeltas from Datetime operation

I have looked at several other related questions here, here, and here, and none of them have come across quite the same problem as me.

I am using Pandas version 0.16.2. I have several columns in a Pandas dataframe, of dtype datetime64[ns]:

In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"]

In [11]: data[date_list].head()

Out[11]:
      SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
    0           NaT      2002-11-18        NaT   2041-03-04       NaT
    1           NaT      2015-01-13        NaT          NaT       NaT
    2           NaT      2014-10-15        NaT          NaT       NaT
    3           NaT      2009-08-27        NaT          NaT       NaT
    4           NaT      2007-04-19        NaT   2013-10-01       NaT

      CurrentValuationDate        DOB InForceDate
    0           2015-04-30 1976-03-04  2002-11-18
    1                  NaT 1949-09-27  2015-01-13
    2                  NaT 1947-06-15  2014-10-15
    3           2015-07-30 1960-06-07  2009-08-27
    4           2010-04-21 1950-10-01  2007-04-19

These were originally in string format (e.g. '1976-03-04') which I converted to datetime objects using:

In [7]: for datecol in date_list:
   ...:         data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')

Here are the dtypes for each of these columns:

In [8]: for datecol in date_list:
              print data[datecol].dtypes

returns:

datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]

So far, so good. But what I want to do is create a new column for each of these columns that gives the age in days (as an integer) from a certain date.

In [13]: current_date = pd.to_datetime("2015-07-31")

I first ran this:

In [14]: for i in date_list:
   ....:         data[i+"InDays"] = data[i].apply(lambda x: current_date - x)

However, when I check the dtype of the returned columns:

In [15]: for datecol in date_list:
   ....:         print data[datecol + "InDays"].dtypes

I get these:

object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]

I don't know why three of them are objects, when they should be timedeltas. What I want to do next is:

In [16]: for i in date_list:
   ....:         data[i+"InDays"] = data[i+"InDays"].dt.days

This approach works fine for the timedelta columns. However, since three of the columns are not timedeltas, I get this error:

AttributeError: Can only use .dt accessor with datetimelike values

I suspect that there are some values in those three columns that are preventing Pandas from converting them to timedeltas. I can't figure out how to work out what those values might be.

Upvotes: 4

Views: 637

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 90879

The issue occurs because you have three columns with only NaT values, which is causing those columns to be treated as objects when you do apply your condition on it.

You should put some kind of condition in your apply part, to default to some timedelta in case of NaT. Example -

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.Timedelta(0))

Or if you cannot do the above, you should put a condition where you want to do - data[i+"InDays"] = data[i+"InDays"].dt.days , to take it only if the dtype of the series allows it.

Or a simpler way to change the apply part to directly get what you want would be -

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)

This would output -

In [110]: data
Out[110]:
  SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
0           NaT      2002-11-18        NaT   2041-03-04       NaT
1           NaT      2015-01-13        NaT          NaT       NaT
2           NaT      2014-10-15        NaT          NaT       NaT
3           NaT      2009-08-27        NaT          NaT       NaT
4           NaT      2007-04-19        NaT   2013-10-01       NaT

  CurrentValuationDate        DOB InForceDate SubmittedDateInDays  \
0           2015-04-30 1976-03-04  2002-11-18                 NaT
1                  NaT 1949-09-27  2015-01-13                 NaT
2                  NaT 1947-06-15  2014-10-15                 NaT
3           2015-07-30 1960-06-07  2009-08-27                 NaT
4           2010-04-21 1950-10-01  2007-04-19                 NaT

   PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays DraftDateInDays  \
0                   4638              NaT              -9348             NaT
1                    199              NaT                NaN             NaT
2                    289              NaT                NaN             NaT
3                   2164              NaT                NaN             NaT
4                   3025              NaT                668             NaT

  CurrentValuationDateInDays  DOBInDays  InForceDateInDays
0                         92      14393               4638
1                        NaN      24048                199
2                        NaN      24883                289
3                          1      20142               2164
4                       1927      23679               3025

If you want your NaT to be changed to NaN you can use -

for i in date_list:
    data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)

Example/Demo -

In [114]: for i in date_list:
   .....:     data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
   .....:

In [115]: data
Out[115]:
  SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate  \
0           NaT      2002-11-18        NaT   2041-03-04       NaT
1           NaT      2015-01-13        NaT          NaT       NaT
2           NaT      2014-10-15        NaT          NaT       NaT
3           NaT      2009-08-27        NaT          NaT       NaT
4           NaT      2007-04-19        NaT   2013-10-01       NaT

  CurrentValuationDate        DOB InForceDate  SubmittedDateInDays  \
0           2015-04-30 1976-03-04  2002-11-18                  NaN
1                  NaT 1949-09-27  2015-01-13                  NaN
2                  NaT 1947-06-15  2014-10-15                  NaN
3           2015-07-30 1960-06-07  2009-08-27                  NaN
4           2010-04-21 1950-10-01  2007-04-19                  NaN

   PolicyStartDateInDays  PaidUpDateInDays  MaturityDateInDays  \
0                   4638               NaN               -9348
1                    199               NaN                 NaN
2                    289               NaN                 NaN
3                   2164               NaN                 NaN
4                   3025               NaN                 668

   DraftDateInDays  CurrentValuationDateInDays  DOBInDays  InForceDateInDays
0              NaN                          92      14393               4638
1              NaN                         NaN      24048                199
2              NaN                         NaN      24883                289
3              NaN                           1      20142               2164
4              NaN                        1927      23679               3025

Upvotes: 1

Related Questions