Sledro
Sledro

Reputation: 137

How can I parse date values for sklearns linear regression?

I am using the following Pandas DataFrame index = groupedCrimes.index:

DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30',
               '2014-10-31', '2014-11-30', '2014-12-31', '2015-01-31',
               '2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31',
               '2015-06-30', '2015-07-31', '2015-08-31', '2015-09-30',
               '2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31',
               '2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
               '2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
               '2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31',
               '2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31'],
              dtype='datetime64[ns]', name='Month', freq='M')

I am converting its type from datetime64[ns] it so I can use sklearns Linear Regression on it.

#I change the dates to be integers, I am not sure this is the best way    
groupedCrimes.index = pd.to_datetime(groupedCrimes.index)  
groupedCrimes.index = (groupedCrimes.index - groupedCrimes.index.min())  / np.timedelta64(1,'D')

This converts it to the following:

[[0.00000000e+00]
 [3.58796296e-13]
 [7.17592593e-13]
 [1.06481481e-12]
 [1.42361111e-12]
 [1.77083333e-12]
 [2.12962963e-12]
 [2.48842593e-12]
 [2.81250000e-12]
 [3.17129630e-12]
 [3.51851852e-12]
 [3.87731481e-12]
 [4.22453704e-12]
 [4.58333333e-12]
 [4.94212963e-12]
 [5.28935185e-12]
 [5.64814815e-12]
 [5.99537037e-12]
 [6.35416667e-12]
 [6.71296296e-12]
 [7.04861111e-12]
 [7.40740741e-12]
 [7.75462963e-12]
 [8.11342593e-12]
 [8.46064815e-12]
 [8.81944444e-12]
 [9.17824074e-12]
 [9.52546296e-12]
 [9.88425926e-12]
 [1.02314815e-11]
 [1.05902778e-11]
 [1.09490741e-11]
 [1.12731481e-11]
 [1.16319444e-11]
 [1.19791667e-11]
 [1.23379630e-11]]

Then for example I can predict one of these values as a date:

[in] model.predict(3.58796296e-13)
[out] array([5990.81354452])

How can I:

  1. A) Convert these numbers back to dates so I can know which dates I am predicting.
  2. B) Convert dates in the future into this format so I can predict dates in the future?

I there a better way to convert and handle the dates?

Upvotes: 1

Views: 734

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

What about simply converting datetime's to # of days since 1970-01-01?

In [386]: df
Out[386]:
                 val
2014-06-30  0.156202
2014-07-31  0.416251
2014-08-31  0.649295
2014-09-30  0.402265
2014-10-31  0.983870
2014-11-30  0.773942
2014-12-31  0.327271
2015-01-31  0.813580
2015-02-28  0.292830
2015-03-31  0.848269
...              ...
2016-08-31  0.595301
2016-09-30  0.171903
2016-10-31  0.355610
2016-11-30  0.477474
2016-12-31  0.517182
2017-01-31  0.891583
2017-02-28  0.591066
2017-03-31  0.799293
2017-04-30  0.225473
2017-05-31  0.444644

[36 rows x 1 columns]

In [387]: df.index = (df.index - pd.to_datetime('1970-01-01')).days

In [388]: df
Out[388]:
            val
16251  0.156202
16282  0.416251
16313  0.649295
16343  0.402265
16374  0.983870
16404  0.773942
16435  0.327271
16466  0.813580
16494  0.292830
16525  0.848269
...         ...
17044  0.595301
17074  0.171903
17105  0.355610
17135  0.477474
17166  0.517182
17197  0.891583
17225  0.591066
17256  0.799293
17286  0.225473
17317  0.444644

[36 rows x 1 columns]

to convert it back:

In [392]: pd.to_datetime(df.index, unit='D')
Out[392]:
DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31',
               '2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31', '2015-06-30', '2015-07-31',
               '2015-08-31', '2015-09-30', '2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31', '2016-02-29',
               '2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
               '2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31'],
              dtype='datetime64[ns]', freq=None)

Upvotes: 3

Related Questions