Reputation: 1071
I have this data frame where the date columns are 'datetime' type
datetime64[ns]
2014-04-30T00:00:00 000000000
Now I want the date in this format - 2014-04-30 . So I used below code
df['StartingDate2'] = XY['StartingDate'].dt.strftime('%m/%d/%Y')
Now, this works accurately, but converts my date to object type. I read here that In Python, dates are objects. But I want to write my final data frame as a table in redshift and so I need my date columns defined as date for my python data frame. Any suggestion about how to do that , will be greatly appreciated.
Update:
sample dataframe
p1 = {'name': ['johnny', 'tommy', 'bobby', 'rocky', 'jimmy'], 'StartingDate': ['2015-07-14T00:00:00.000000000', '2013-10-30T00:00:00.000000000', '2014-04-30T00:00:00.000000000', '2014-01-27T00:00:00.000000000', '2016-01-15T00:00:00.000000000'], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA'], 'comment1': ['Very good performance', 'N/A', 'Need to work hard', 'No Comment', 'Not satisfactory'], 'comment2': ['good', 'Meets Expectation', 'N', 'N/A', 'Incompetence']}
XY = pd.DataFrame(data = p1)
XY['today'] = datetime.datetime.now()
When I am using to_datetime() solution - it doesn't work
XY['today2'] = pd.to_datetime(XY['today'], format = '%m/%d/%Y')
XY['StartingDate2'] = pd.to_datetime(XY['today'], format = '%m/%d/%Y')
Alternatively - this works, when strftime() and to_datetime() are used in combination.
XY['StartingDate2'] = XY['StartingDate2'].dt.strftime('%m/%d/%Y')
XY['StartingDate2'] = pd.to_datetime(XY['StartingDate2'])
But although this solution works for the sample data, is not working for me. The data I have looks like this -
array(['2015-09-29T14:34:39.000000000', '2015-10-07T14:13:03.000000000',
'2015-10-07T19:17:50.000000000', ...,
'2017-12-05T14:06:42.000000000', '2017-12-06T16:36:44.000000000',
'2017-12-06T18:26:49.000000000'], dtype='datetime64[ns]'
Upvotes: 2
Views: 8512
Reputation: 5437
The solution is to use to_datetime
s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])
s
0 3/11/2000
1 3/12/2000
2 3/13/2000
dtype: object
pd.to_datetime(s)
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
So in your situation, you could write
df['StartingDate2'] = pd.to_datetime(XY['StartingDate'], format='%m/%d/%Y')
and in fact you can neglect the format keyword here. But if you provide it, you will get a huge speed increase.
Benchmarks
s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000']*1000)
Without format
%%timeit
pd.to_datetime(s)
453 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With format
%%timeit
pd.to_datetime(s, format='%m/%d/%Y')
9.68 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Note for me: Always provide the format string if possible
Answer for updated question
You can use date times floor function
df = pd.DataFrame(pd.to_datetime(['2015-09-29T14:34:39.000000000', '2015-10-07T14:13:03.000000000',
'2015-10-07T19:17:50.000000000',
'2017-12-05T14:06:42.000000000', '2017-12-06T16:36:44.000000000',
'2017-12-06T18:26:49.000000000']), columns=['A'])
df['B'] = df['A'].dt.floor('d')
df.dtypes
A datetime64[ns]
B datetime64[ns]
dtype: object
df
A B
0 2015-09-29 14:34:39 2015-09-29
1 2015-10-07 14:13:03 2015-10-07
2 2015-10-07 19:17:50 2015-10-07
3 2017-12-05 14:06:42 2017-12-05
4 2017-12-06 16:36:44 2017-12-06
5 2017-12-06 18:26:49 2017-12-06
Upvotes: 3