RubenGeert
RubenGeert

Reputation: 2952

Does xlrd retrieve date variables correctly from Excel?

I was trying to read a multiple sheet Excel workbook into SPSS when I stumbled upon the following problem: when I read a date variable from Excel into Python with xlrd, it seems to add 2 days to the date. Or perhaps my conversion from the Excel format to a more human friendly representation is not correct. Could anybody tell me what's wrong in the code below?

import xlwt,datetime 
wb=xlwt.Workbook() 
ws=wb.add_sheet("date_1") 
fmt = xlwt.easyxf(num_format_str='M/D/YY') 
ws.write(0,0,datetime.datetime.now(),fmt) 
wb.save(r"d:\temp\datetest.xls") 

#Now open Excel file manually -> date is correct

import xlrd
wb=xlrd.open_workbook(r"d:\temp\datetest.xls") 
ws=wb.sheets()[0]
Data = ws.row_values(0)[0]
print datetime.datetime(1900,1,1,0,0,0)+datetime.timedelta(days=Data)

#Now date is 2 days off

Upvotes: 0

Views: 1304

Answers (3)

John Machin
John Machin

Reputation: 83032

Earlier answers are only partially correct.

Extra info:

There are TWO Excel date systems: (1900 (Windows) and 1904 (Mac)).

1900 system: earliest non-ambiguous datetime is 1900-03-01T00:00:00, represented as 61.0.

1904 system: earliest non-ambiguous datetime is 1904-01-02T00:00:00, represented as 1.0.

Which date system is in effect is available in xlrd from Book.datemode.

xlrd supplies a function called xldate_as_tuple that takes care of all of the above. This code:

print datum
print datetime.datetime(1900, 1, 1) + datetime.timedelta(days=datum)
print datetime.datetime(1900, 3, 1) + datetime.timedelta(days=datum - 61)
tup = xlrd.xldate_as_tuple(datum, wb.datemode)
print tup
print datetime.datetime(*tup)

produces:

41274.4703588
2013-01-02 11:17:19
2012-12-31 11:17:19
(2012, 12, 31, 11, 17, 19)
2012-12-31 11:17:19

when wb.datemode is 0 (1900).

This information is all contained in the documentation that is distributed with xlrd.

Upvotes: 1

RubenGeert
RubenGeert

Reputation: 2952

Nope. There's two things going on here.

1 - in Excel, "1" rather than "0" corresponds to January 1, 1900 2 - Excel includes Feb 29, 1900 (which never occurred), accounting for the second day of difference. This is done on purpose for backward compatibility reasons.

Taking these two points into account seems to solve all issues.

Upvotes: 1

jdotjdot
jdotjdot

Reputation: 17092

I'm pretty sure that xlrd is able to tell when the cell is formatted in Excel as a date, and make the conversion to Python date object on its own. It's not foolproof, though.

Your issue is probably by starting with datetime.datetime(1900,1,1,0,0,0) and adding the timedelta to it--you might want to try:

datetime.date(1899,12,31) + datetime.timedelta(days=Data)

Which should avoid the (a) one day you're adding by starting at 1/1/1900 and (b) one day you're adding (I'm guessing) from having it be a datetime object rather than date, which may be pushing it over into the next day. This is just a guess, though.

Alternatively, if you already know that it's consistently two days, why don't you just do this?

print datetime.datetime(1900,1,1,0,0,0) + datetime.timedelta(days=Data - 2)

Upvotes: 1

Related Questions