Reputation: 137
I am using the following Pandas DataFrame index = groupedCrimes.index:
DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30',
'2014-10-31', '2014-11-30', '2014-12-31', '2015-01-31',
'2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31',
'2015-06-30', '2015-07-31', '2015-08-31', '2015-09-30',
'2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31',
'2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
'2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
'2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31',
'2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31'],
dtype='datetime64[ns]', name='Month', freq='M')
I am converting its type from datetime64[ns] it so I can use sklearns Linear Regression on it.
#I change the dates to be integers, I am not sure this is the best way
groupedCrimes.index = pd.to_datetime(groupedCrimes.index)
groupedCrimes.index = (groupedCrimes.index - groupedCrimes.index.min()) / np.timedelta64(1,'D')
This converts it to the following:
[[0.00000000e+00]
[3.58796296e-13]
[7.17592593e-13]
[1.06481481e-12]
[1.42361111e-12]
[1.77083333e-12]
[2.12962963e-12]
[2.48842593e-12]
[2.81250000e-12]
[3.17129630e-12]
[3.51851852e-12]
[3.87731481e-12]
[4.22453704e-12]
[4.58333333e-12]
[4.94212963e-12]
[5.28935185e-12]
[5.64814815e-12]
[5.99537037e-12]
[6.35416667e-12]
[6.71296296e-12]
[7.04861111e-12]
[7.40740741e-12]
[7.75462963e-12]
[8.11342593e-12]
[8.46064815e-12]
[8.81944444e-12]
[9.17824074e-12]
[9.52546296e-12]
[9.88425926e-12]
[1.02314815e-11]
[1.05902778e-11]
[1.09490741e-11]
[1.12731481e-11]
[1.16319444e-11]
[1.19791667e-11]
[1.23379630e-11]]
Then for example I can predict one of these values as a date:
[in] model.predict(3.58796296e-13)
[out] array([5990.81354452])
How can I:
I there a better way to convert and handle the dates?
Upvotes: 1
Views: 734
Reputation: 210982
What about simply converting datetime's to # of days since 1970-01-01
?
In [386]: df
Out[386]:
val
2014-06-30 0.156202
2014-07-31 0.416251
2014-08-31 0.649295
2014-09-30 0.402265
2014-10-31 0.983870
2014-11-30 0.773942
2014-12-31 0.327271
2015-01-31 0.813580
2015-02-28 0.292830
2015-03-31 0.848269
... ...
2016-08-31 0.595301
2016-09-30 0.171903
2016-10-31 0.355610
2016-11-30 0.477474
2016-12-31 0.517182
2017-01-31 0.891583
2017-02-28 0.591066
2017-03-31 0.799293
2017-04-30 0.225473
2017-05-31 0.444644
[36 rows x 1 columns]
In [387]: df.index = (df.index - pd.to_datetime('1970-01-01')).days
In [388]: df
Out[388]:
val
16251 0.156202
16282 0.416251
16313 0.649295
16343 0.402265
16374 0.983870
16404 0.773942
16435 0.327271
16466 0.813580
16494 0.292830
16525 0.848269
... ...
17044 0.595301
17074 0.171903
17105 0.355610
17135 0.477474
17166 0.517182
17197 0.891583
17225 0.591066
17256 0.799293
17286 0.225473
17317 0.444644
[36 rows x 1 columns]
to convert it back:
In [392]: pd.to_datetime(df.index, unit='D')
Out[392]:
DatetimeIndex(['2014-06-30', '2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31',
'2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30', '2015-05-31', '2015-06-30', '2015-07-31',
'2015-08-31', '2015-09-30', '2015-10-31', '2015-11-30', '2015-12-31', '2016-01-31', '2016-02-29',
'2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30', '2016-07-31', '2016-08-31', '2016-09-30',
'2016-10-31', '2016-11-30', '2016-12-31', '2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
'2017-05-31'],
dtype='datetime64[ns]', freq=None)
Upvotes: 3