Reading date from .xlsx file returns 5 digit timestamp, what is this?

Question

My script converts a .xlsx file into many .csv files (one .csv per sheet).

Here is the code:

wb = xlrd.open_workbook(filepath)

for i in range(0, wb.nsheets):
    sh = wb.sheet_by_index(i)
    sheet_name = sh.name
    sheet_name = sheet_name.replace(" ", "_");
    fp = open(sheet_name+'.csv', 'at', encoding='utf8')
    wr = csv.writer(fp, quoting=csv.QUOTE_ALL)

    for row_num in range(sh.nrows):
        wr.writerow(sh.row_values(row_num))

    fp.close()

Full code here: https://github.com/andreipdev/xlsx_to_csv

This works well, except that I have a field that is a date, in excel it shows for example: 01/01/2009

But the final csv, contains a number that is 39814. What is this please, what can I do with it? 01/02/2009 is 39815.

Is it a number I can use to find the unix timestamp? Or is it an issue and I should change my script? I would feel safer if it would just return the string "01/01/2009".

unutbu · Accepted Answer

If 39814 maps to 2009-1-1 and 39815 maps to 2009-1-2, then it looks like the ordinal is counting the number of days since 1899-12-30:

In [57]: DT.date(1899,12,30) + DT.timedelta(days=39814)
Out[57]: datetime.date(2009, 1, 1)

See Why 1899-12-30!?

To convert the Excel number to a Unix timestamp, you could use the timetuple method to convert the datetime.date object to a timetuple, and then time.mktime to convert it to a timestamp (seconds since the Epoch):

In [80]: import time

In [81]: time.mktime((DT.datetime(1899,12,30) + DT.timedelta(days=39814)).timetuple())
Out[81]: 1230786000.0

Reading date from .xlsx file returns 5 digit timestamp, what is this?

Answers (1)

Related Questions