Reputation: 63
I'm trying to read an Excel file with OpenPyXL, but the values from Date/Time fields are not accurate.
import openpyxl
wb = openpyxl.load_workbook('doc.xlsx')
ws = wb.active
for rowNum in range(2, ws.max_row + 1):
date = ws['A' + str(rowNum)].value
print(date)
Excel data: 2019-08-16 06:55:00
Value I get in Python: 2019-08-16 06:54:59.999996
How could I read Date/Time fields accurately?
Upvotes: 2
Views: 1594
Reputation: 1091
According to this issue, the problem is a result of Excel's implementation; not OpenPyXL's:
The XLSX file format cannot be relied upon to store dates and times accurately. These are always converted internally to a serial form of limited precision.
You could represent the values as text in Excel and then parse that text. For example...
Say you have a Date value in cell A2 with the custom format "m/d/yy h:mm:ss":
8/16/19 6:55:00
In a separate column, you can write the formula
=TEXT(A2,"yyyy-mm-dd")&"T"&TEXT(A2,"hh:mm:ss")
and copy to other cells in the column, resulting in
2019-08-16T06:55:00
If you don't want this separate column, just overwrite column A by copying and pasting the values from the new text column into it.
Now, when you read the values you'll need to be careful to use the option data_only=True
with openpyxl.load_workbook
so you read the actual value of the text; not the formula.
import openpyxl
wb = openpyxl.load_workbook('doc.xlsx', data_only=True)
ws = wb.active
for rowNum in range(2, ws.max_row + 1):
date = ws['A' + str(rowNum)].value
print(date)
Then parse the text in date
according to the ISO 8601 format.
One drawback to the text option is that it requires more work in the Excel spreadsheet. If you know the precision you're going for, you can have Python round the values for you.
Caution should be taken, however! If the desired precision is too small, the lack of precision on Excel's part could conceivably result in rounding to incorrect numbers. This could be especially true of compounding formulas, such as =<above cell>+1/24
.
See this question (and particularly this answer) for a good solution to rounding datetimes. Here's a spin-off of solution I used for my own code (note f-string use; you'll need to use str.format()
if you're using Python < 3.6):
import datetime as dt
def round_datetime(d, precision='second'):
"""Round a datetime to the named precision.
Cf. https://stackoverflow.com/a/3464000/7232335
"""
d += dt.timedelta(**{f"{precision}s": 1}) / 2
d -= dt.timedelta(minutes=d.minute,
seconds=d.second,
microseconds=d.microsecond)
return d # Return optional, since object d is modified in-place.
There are Worksheet methods available for iterating through rows with OpenPyXL:
ws.iter_rows()
for row in ws.iter_rows(min_row=2, values_only=True):
date = row[0]
ws.values
Same as ws.iter_rows()
with no arguments except values_only=True
.
values = ws.values
next(values) # Discard first row.
for row in values:
date = row[0]
ws.rows
Same as ws.iter_rows()
with no arguments.
rows = ws.rows
next(rows) # Discard first row.
for row in ws.rows:
date = row[0].value
Upvotes: 1
Reputation: 43585
Are you sure you are referring to the correct Excel file and the value is written as it is? Tried to replicate:
import openpyxl
wb = openpyxl.load_workbook('doc.xlsx')
ws = wb.active
date = ws['A1'].value
print(date)
and this is what I got with Python3 and Excel 2010:
Upvotes: 0