Reputation: 93
i need to read a excel file without changing any date , time format , float format and convert to data-frame. This is working fine if i convert the excel to CSV and read it using read_csv() .
eg:
import pandas as pd
import numpy as np
#code for reading excel
df=pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group 1",keep_default_na=False,dtype=str)
print("df_excel:")
#code for reading csv
df1=pd.read_csv("Group 1.csv",parse_dates=False,dtype=str,na_filter = False)
print("df_csv:",df1)
in the above code parse_dates=False is working fine while reading CSV file, but parse_dates=False is not working in read_excel()
Expected output: Need the exact excel data into a data-frame without changing the date , time format.
Upvotes: 5
Views: 4548
Reputation: 1933
As mentioned before, the docs of pd.read_excel
say under the parameter parse_dates
:
If you don't want to parse some cells as date just change their type in Excel to "Text".
This means one could preprocess the cells in Excel to the "Text" number format before using pd.read_excel
. However, switching to "Text" number format alone changes the dates to numbers in Excel, e.g., 1900-12-31 becomes 366 (the cryptic number is the number of days since 1900-01-01 (it is 366 here because Excel falsely presumes that 1900 was a leap year)). This means that the instruction of just changing the number format to "Text" isn’t useful.
Instead, a practical solution is to iterate over the date columns and set the Excel VBA .Text
as value and change the number format to "Text" to return the string exactly as it displays, which can be done with xlwings:
from pathlib import Path
import xlwings as xw
import pandas as pd
path = Path(r"foo/test.xlsx")
with xw.App(visible=False):
wb = xw.Book(path)
ws = wb.sheets[0]
for cell in ws.used_range:
temp_str = cell.api.Text
cell.number_format = "@" # @ is the number format code of Excel for the "Text" number format.
cell.value = temp_str
wb.save(path.with_stem(f"{path.stem}_interim"))
# Then the Excel file can be read with xlwings.
df = ws.used_range.options(pd.DataFrame, index=False).value
wb.close()
# Or it can be read with pandas.
df = pd.read_excel(path.with_stem(f"{path.stem}_interim"))
print(df)
Note the prerequisites for the installation of xlwings.
Unfortunately the .Text
property can only be used on a range of one cell, which is the reason why a loop over the range of cells in question is required. Note that I used used_range in this example, which means that all cells with contents on the worksheet are formatted to the "Text" number format.
Upvotes: 0
Reputation: 66
From the Pandas docs on the parse_dates parameter for read_excel():
If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. If you don`t want to parse some cells as date just change their type in Excel to “Text”.
You could try this:
df = pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group1",keep_default_na=False,dtype=str, converters={'as_at_date': str})
Explicitly converting the date column to string might help.
Upvotes: 0