user18101
user18101

Reputation: 635

Pandas.read_excel reads date into timestamp, I want a string

I read a large Excel file into pandas using .read_excel, and the file has date columns. When read into pandas, the dates default to a timestamp. Since the file is large, I would like to read the dates as a string.

If that is not possible, then I would at least like to export the date back to Excel in the same format as it is in the original file (e.g. "8/18/2009").

My two questions are:

  1. Can I avoid converting the Excel date into a timestamp in pandas?
  2. If not possible, how can I write back the date in the original format efficiently?

Upvotes: 11

Views: 21774

Answers (4)

mouwsy
mouwsy

Reputation: 1933

The docs of pd.read_excel say under the parameter parse_dates:

If you don't want to parse some cells as date just change their type in Excel to "Text".

This means one could preprocess the cells in Excel to the "Text" number format before using pd.read_excel. However, switching to "Text" number format alone changes the dates to numbers in Excel, e.g., 1900-12-31 becomes 366 (the cryptic number is the number of days since 1900-01-01 (it is 366 here because Excel falsely presumes that 1900 was a leap year)). This means that the instruction of just changing the number format to "Text" isn’t useful.

Instead, a practical solution is to iterate over the date columns and set the Excel VBA .Text as value and change the number format to "Text" to return the string how it displays, which can be done with xlwings:

from pathlib import Path
import xlwings as xw
import pandas as pd

path = Path(r"foo/test.xlsx")

with xw.App(visible=False):
    wb = xw.Book(path)
    ws = wb.sheets[0]
    for cell in ws.used_range:
        temp_str = cell.api.Text
        cell.number_format = "@"  # @ is the number format code of Excel for the "Text" number format.
        cell.value = temp_str
    wb.save(path.with_stem(f"{path.stem}_interim"))
    # Then the Excel file can be read with xlwings.
    df = ws.used_range.options(pd.DataFrame, index=False).value
    wb.close()

# Or it can be read with pandas.
df = pd.read_excel(path.with_stem(f"{path.stem}_interim"))

print(df)

Note the prerequisites for the installation of xlwings.
Unfortunately the .Text property can only be used on a range of one cell, which is the reason why a loop over the range of cells in question is required. Note that I used used_range in this example, which means that all cells with contents on the worksheet are formatted to the "Text" number format.

Upvotes: 0

Chirantan De
Chirantan De

Reputation: 31

  1. I am not sure how to read the date and not convert into timestamp using read_excel.
  2. Because the date is already converted into datetime while reading it into a dataframe, here is how the date can be printed in the original format - I have used 'mm/dd/yyyy'.
import pandas as pd

df = pd.read_excel(
    "file_to_read.xlsx",
    sheet_name="sheetname",
)
writer = pd.ExcelWriter(
    "file_to_write.xlsx",
    engine="xlsxwriter",
    datetime_format="mm/dd/yyyy",
)
df.to_excel(
    writer,
    index=False,
    header=True,
    sheet_name="sheetname",
)

Upvotes: 3

WhyNotIndeed
WhyNotIndeed

Reputation: 11

I had the same problem. This is what solved the issue for me:

df = pd.read_excel(excel_link, sheet_name, dtype=str)

If you don't mind converting the df or entire column to string

Upvotes: 1

YDD9
YDD9

Reputation: 135

this is similar as issue here. Leave dates as strings using read_excel function from pandas in python

check the answers:

  • Using converters{'Date': str} option inside the pandas.read_excel which helps.
    pandas.read_excel(xlsx, sheet, converters={'Date': str})
  • you can try convert your timestamp back to the original format
    df['Date'][0].strftime('%Y/%m/%d')

Upvotes: 1

Related Questions