Bala
Bala

Reputation: 21

Pandas pd.readexcel not able to read cell values starting with "#" (Formula Errors)

I am trying to read excel file using pandas pd.read_excel and data in cells contains text values as formula errors. My requirement is to read the formula errors as text and not replace it with NA

Example: #NA, #DIV/0! #Value

While reading this data using pd.read_excel all the data is converted to NA instead of reading the data as is.

I did try using openpyxl but for my overall requirement, pandas read excel is apt because I get excel files with different formats.

Code:

import pandas as pd

xlsx = pd.ExcelFile('Book1.xlsx')
df = pd.read_excel(xlsx, sheet_name ='Sheet1' ,engine_kwargs={"values_only":True})

Source:

1,0, #DIV/0!(Formula Error), #DIV/0! (Text)

Result:

LHS RHS Formula Ftext 0 1 0 NaN NaN

Upvotes: 1

Views: 137

Answers (1)

mozway
mozway

Reputation: 261830

#NA should be interpreted as NaN, but not #DIV/0!/#Value

If you don't want read_excel to parse NaNs you have several options:

pd.read_excel('your_file.xlsx', na_filter=False)

Which will keep all strings intact (including empty strings).

Or a combination of keep_default_na and na_values. For example to ignore everything as NaN except empty strings:

pd.read_excel('your_file.xlsx',  keep_default_na=False, na_values=[''])

Upvotes: 0

Related Questions