Reputation: 400
I know that you can specify data types when reading excels using pd.read_excel
(as outlined here). Can you do the same using pd.ExcelFile
?
I have the following code:
if ".xls" in
xl = pd.ExcelFile(path + "\\" + name, )
for sheet in xl.sheet_names:
xl_parsed = xl.parse(sheet)
When parsing the sheet, some of the values in the columns are displayed in scientific notation. I don't know the column names before loading so I need to import everything as string. Ideally I would like to be able to do something like xl_parsed = xl.parse(sheet, dtype = str)
. Any suggestions?
Upvotes: 5
Views: 7793
Reputation: 400
I went with roganjosh's suggestion - open the excel first, get column names and then pass as converter.
if ".xls" in name:
xl = pd.ExcelFile(path)
sheetCounter = 1
for sheet in xl.sheet_names:
### Force to read as string ###
column_list = []
df_column = pd.read_excel(path, sheetCounter - 1).columns
for i in df_column:
column_list.append(i)
converter = {col: str for col in column_list}
##################
xl_parsed = xl.parse(sheet, converters=converter)
sheetCounter = sheetCounter + 1
Upvotes: 1
Reputation: 1819
If you would prefer a cleaner solution, I used the following:
excel = pd.ExcelFile(path)
for sheet in excel.sheet_names:
columns = excel.parse(sheet).columns
converters = {column: str for column in columns}
data = excel.parse(sheet, converters=converters)
Upvotes: 3