Reputation: 1295
I have converted some excel files using pandas in csv. I noticed that the date is in decimal format now. However, it is not clear to me how to convert to datetime
or at least timestamp.
So the converted dates look like:
42794.741667
I have understood that before the decimal is the days from 1900, but I am not finding anything regarding the second part of the decimal number. A lot of questions suggest to directly convert from read_excel: pandas automatically converting date. However, I can not do that as the excel files are pretty complex and do not have a predefined structure.
Upvotes: 0
Views: 1438
Reputation: 16952
Excel dates are not days from 1970, they are days from 30-Dec-1899. (You may be thinking of Posix timestamps, which is seconds from 1-Jan-1970).
Excel reports day 1 as 1-Jan-1900 (which would imply an epoch of 31-Dec-1899) but Excel also thinks that 1900 was a leap year, which it wasn't, so Excel's count is off by one for dates between 1-Jan-1900 and 28-Feb-1900.
This behaviour is deliberate. In the early days, Excel was competing with the market leader, Lotus 1-2-3, which also thought that 1900 was a leap year, and by the time people began to care about it, it was considered too disruptive to fix.
You can convert Excel dates to Python datetime
s that pandas
will understand like this:
>>> import datetime
>>> datetime.datetime(1899,12,30) + datetime.timedelta(days=42794.741667)
datetime.datetime(2017, 2, 28, 17, 48, 0, 28800)
If you paste 42794.741667 into an empty Excel spreadsheet and format it as a date, you can confirm the conversion.
Upvotes: 5