Marco C.
Marco C.

Reputation: 187

Pandas read_excel openpyxl generates a ValueError

Recently my code broke, like a week ago. Looks like something is going wrong with the openpyxl dependency. Hoping someone else has this issue and can tell me it's not just me being a bad programmer lol

Edit1:

The excel file I'm reading is generated as a .xlsx from Seeking Alpha's Portfolio Excel Export feature.

Edit2:

The Excel Export file now contains an added row with 2 empty conditionally formatted cells to a sheet that I don't even pass through to the sheet_name arg. The problem seems to be that openpyxl can't parse empty cells that have conditional formatting. How can I make read_excel only parse the sheets I listed? Or maybe drop the row that's causing problems?

After adding a "-" or removing conditional formatting, my script works. But I'd like to not have to do this every time I export the excel file. Also, the following warning appears now.

/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py
/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:308: 
UserWarning: Conditional Formatting extension is not supported and will be removed
  warn(msg)

My line of code that causes error

dic = pd.read_excel(path, sheet_name=sheet_names)

Error message

ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'}

Complete Log

/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py
Traceback (most recent call last):
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 862, in <module>
    df = excel_data(path_excel, sheet_names)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 129, in excel_data
    dic = pd.read_excel(path, sheet_name=sheet_names)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
    data = io.parse(
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1170, in parse
    return self._reader.parse(
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 492, in parse
    data = self.get_sheet_data(sheet, convert_float)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 548, in get_sheet_data
    for row_number, row in enumerate(sheet.rows):
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 79, in _cells_by_row
    for idx, row in parser.parse():
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 145, in parse
    dispatcher[tag_name](element)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 288, in parse_formatting
    cf = ConditionalFormatting.from_tree(element)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
    return cls(**attrib)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/formatting/rule.py", line 201, in __init__
    self.operator = operator
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'}

Upvotes: 0

Views: 2638

Answers (1)

qunal
qunal

Reputation: 11

I have same problem with Seeking Alpha (SA) xlsx. pandas 1.1.5 does read ok but current pandas 1.3.2 has this problem. A possible hack workaround is to install older pandas in a different location and import that pandas version 1.1.5 as pd1 just to import the xlsx . So pd is pandas 1.3.2 and pd1 is pandas 1.1.5 The following has some hints Supporting multiple Python module versions (with the same version of Python)

And use the target option in pip import as explained in Supporting multiple Python module versions (with the same version of Python)

Edited to add sample code Pandas is a large subsystem and multiple versions are a bit demanding to install. A simpler solution was to create a different environment ( I used conda) and execute a command ( .bat in windows .sh in unix)

sample SA_oldpandas.bat

echo old pandas 1.1.5
conda activate env-oldpandas 

cd {code directory}
echo ---GOING----
python SA_OldPandas.py
echo ---FINISHED---- %date% %time%
start excel SA_check.xlsx

basically the old Pandas code writes the data to a excel and the new code uses this

os.getcwd()
os.system('SA_oldpandas.bat')

Upvotes: 1

Related Questions