Reputation: 635
I read a large Excel file into pandas using .read_excel
, and the file has date columns. When read into pandas, the dates default to a timestamp. Since the file is large, I would like to read the dates as a string.
If that is not possible, then I would at least like to export the date back to Excel in the same format as it is in the original file (e.g. "8/18/2009"
).
My two questions are:
Upvotes: 11
Views: 21774
Reputation: 1933
The docs of pd.read_excel
say under the parameter parse_dates
:
If you don't want to parse some cells as date just change their type in Excel to "Text".
This means one could preprocess the cells in Excel to the "Text" number format before using pd.read_excel
. However, switching to "Text" number format alone changes the dates to numbers in Excel, e.g., 1900-12-31 becomes 366 (the cryptic number is the number of days since 1900-01-01 (it is 366 here because Excel falsely presumes that 1900 was a leap year)). This means that the instruction of just changing the number format to "Text" isn’t useful.
Instead, a practical solution is to iterate over the date columns and set the Excel VBA .Text
as value and change the number format to "Text" to return the string how it displays, which can be done with xlwings:
from pathlib import Path
import xlwings as xw
import pandas as pd
path = Path(r"foo/test.xlsx")
with xw.App(visible=False):
wb = xw.Book(path)
ws = wb.sheets[0]
for cell in ws.used_range:
temp_str = cell.api.Text
cell.number_format = "@" # @ is the number format code of Excel for the "Text" number format.
cell.value = temp_str
wb.save(path.with_stem(f"{path.stem}_interim"))
# Then the Excel file can be read with xlwings.
df = ws.used_range.options(pd.DataFrame, index=False).value
wb.close()
# Or it can be read with pandas.
df = pd.read_excel(path.with_stem(f"{path.stem}_interim"))
print(df)
Note the prerequisites for the installation of xlwings.
Unfortunately the .Text
property can only be used on a range of one cell, which is the reason why a loop over the range of cells in question is required. Note that I used used_range in this example, which means that all cells with contents on the worksheet are formatted to the "Text" number format.
Upvotes: 0
Reputation: 31
import pandas as pd
df = pd.read_excel(
"file_to_read.xlsx",
sheet_name="sheetname",
)
writer = pd.ExcelWriter(
"file_to_write.xlsx",
engine="xlsxwriter",
datetime_format="mm/dd/yyyy",
)
df.to_excel(
writer,
index=False,
header=True,
sheet_name="sheetname",
)
Upvotes: 3
Reputation: 11
I had the same problem. This is what solved the issue for me:
df = pd.read_excel(excel_link, sheet_name, dtype=str)
If you don't mind converting the df or entire column to string
Upvotes: 1
Reputation: 135
this is similar as issue here. Leave dates as strings using read_excel function from pandas in python
check the answers:
pandas.read_excel(xlsx, sheet, converters={'Date': str})
df['Date'][0].strftime('%Y/%m/%d')
Upvotes: 1