michal111
michal111

Reputation: 400

Pandas ExcelFile read columns as string

I know that you can specify data types when reading excels using pd.read_excel (as outlined here). Can you do the same using pd.ExcelFile?

I have the following code:

 if ".xls" in 
     xl = pd.ExcelFile(path + "\\" + name, )
     for sheet in xl.sheet_names:
         xl_parsed = xl.parse(sheet)

When parsing the sheet, some of the values in the columns are displayed in scientific notation. I don't know the column names before loading so I need to import everything as string. Ideally I would like to be able to do something like xl_parsed = xl.parse(sheet, dtype = str). Any suggestions?

Upvotes: 5

Views: 7793

Answers (2)

michal111
michal111

Reputation: 400

I went with roganjosh's suggestion - open the excel first, get column names and then pass as converter.

if ".xls" in name:
    xl = pd.ExcelFile(path)
    sheetCounter = 1
        for sheet in xl.sheet_names:
        ### Force to read as string ###
        column_list = []
        df_column = pd.read_excel(path, sheetCounter - 1).columns
        for i in df_column:
            column_list.append(i)
        converter = {col: str for col in column_list}
        ##################
        xl_parsed = xl.parse(sheet, converters=converter)
        sheetCounter = sheetCounter + 1

Upvotes: 1

GammaGames
GammaGames

Reputation: 1819

If you would prefer a cleaner solution, I used the following:

excel = pd.ExcelFile(path)
for sheet in excel.sheet_names:
    columns = excel.parse(sheet).columns
    converters = {column: str for column in columns}

    data = excel.parse(sheet, converters=converters)

Upvotes: 3

Related Questions