prabhaahar
prabhaahar

Reputation: 93

How to stop python auto date parsing while reading a excel file

i need to read a excel file without changing any date , time format , float format and convert to data-frame. This is working fine if i convert the excel to CSV and read it using read_csv() .

eg:

import pandas as pd
import numpy as np
#code for reading excel
df=pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group 1",keep_default_na=False,dtype=str)
print("df_excel:")
#code for reading csv   

df1=pd.read_csv("Group 1.csv",parse_dates=False,dtype=str,na_filter = False)
 print("df_csv:",df1)

output: enter image description here

in the above code parse_dates=False is working fine while reading CSV file, but parse_dates=False is not working in read_excel()

Expected output: Need the exact excel data into a data-frame without changing the date , time format.

Upvotes: 5

Views: 4548

Answers (2)

mouwsy
mouwsy

Reputation: 1933

As mentioned before, the docs of pd.read_excel say under the parameter parse_dates:

If you don't want to parse some cells as date just change their type in Excel to "Text".

This means one could preprocess the cells in Excel to the "Text" number format before using pd.read_excel. However, switching to "Text" number format alone changes the dates to numbers in Excel, e.g., 1900-12-31 becomes 366 (the cryptic number is the number of days since 1900-01-01 (it is 366 here because Excel falsely presumes that 1900 was a leap year)). This means that the instruction of just changing the number format to "Text" isn’t useful.

Instead, a practical solution is to iterate over the date columns and set the Excel VBA .Text as value and change the number format to "Text" to return the string exactly as it displays, which can be done with xlwings:

from pathlib import Path
import xlwings as xw
import pandas as pd

path = Path(r"foo/test.xlsx")

with xw.App(visible=False):
    wb = xw.Book(path)
    ws = wb.sheets[0]
    for cell in ws.used_range:
        temp_str = cell.api.Text
        cell.number_format = "@"  # @ is the number format code of Excel for the "Text" number format.
        cell.value = temp_str
    wb.save(path.with_stem(f"{path.stem}_interim"))
    # Then the Excel file can be read with xlwings.
    df = ws.used_range.options(pd.DataFrame, index=False).value
    wb.close()

# Or it can be read with pandas.
df = pd.read_excel(path.with_stem(f"{path.stem}_interim"))

print(df)

Note the prerequisites for the installation of xlwings.
Unfortunately the .Text property can only be used on a range of one cell, which is the reason why a loop over the range of cells in question is required. Note that I used used_range in this example, which means that all cells with contents on the worksheet are formatted to the "Text" number format.

Upvotes: 0

lok1
lok1

Reputation: 66

From the Pandas docs on the parse_dates parameter for read_excel():

If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. If you don`t want to parse some cells as date just change their type in Excel to “Text”.

You could try this:

df = pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group1",keep_default_na=False,dtype=str, converters={'as_at_date': str})

Explicitly converting the date column to string might help.

Upvotes: 0

Related Questions