Reputation: 187
Recently my code broke, like a week ago. Looks like something is going wrong with the openpyxl
dependency. Hoping someone else has this issue and can tell me it's not just me being a bad programmer lol
The excel file I'm reading is generated as a .xlsx
from Seeking Alpha's Portfolio Excel Export feature.
The Excel Export file now contains an added row with 2 empty conditionally formatted cells to a sheet that I don't even pass through to the sheet_name
arg. The problem seems to be that openpyxl
can't parse empty cells that have conditional formatting. How can I make read_excel
only parse the sheets I listed? Or maybe drop the row that's causing problems?
After adding a "-" or removing conditional formatting, my script works. But I'd like to not have to do this every time I export the excel file. Also, the following warning appears now.
/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py
/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:308:
UserWarning: Conditional Formatting extension is not supported and will be removed
warn(msg)
My line of code that causes error
dic = pd.read_excel(path, sheet_name=sheet_names)
Error message
ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'}
Complete Log
/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/bin/python3.9 /Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py
Traceback (most recent call last):
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 862, in <module>
df = excel_data(path_excel, sheet_names)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/analysis.py", line 129, in excel_data
dic = pd.read_excel(path, sheet_name=sheet_names)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
data = io.parse(
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1170, in parse
return self._reader.parse(
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 492, in parse
data = self.get_sheet_data(sheet, convert_float)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 548, in get_sheet_data
for row_number, row in enumerate(sheet.rows):
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 79, in _cells_by_row
for idx, row in parser.parse():
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 145, in parse
dispatcher[tag_name](element)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 288, in parse_formatting
cf = ConditionalFormatting.from_tree(element)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/formatting/rule.py", line 201, in __init__
self.operator = operator
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
super(NoneSet, self).__set__(instance, value)
File "/Users/marcoucolon/Documents/GitHub/opt-portfolio/.venv/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
raise ValueError(self.__doc__)
ValueError: Value must be one of {'equal', 'greaterThanOrEqual', 'containsText', 'beginsWith', 'notEqual', 'greaterThan', 'between', 'endsWith', 'notContains', 'lessThan', 'lessThanOrEqual', 'notBetween'}
Upvotes: 0
Views: 2638
Reputation: 11
I have same problem with Seeking Alpha (SA) xlsx. pandas 1.1.5 does read ok but current pandas 1.3.2 has this problem. A possible hack workaround is to install older pandas in a different location and import that pandas version 1.1.5 as pd1 just to import the xlsx . So pd is pandas 1.3.2 and pd1 is pandas 1.1.5 The following has some hints Supporting multiple Python module versions (with the same version of Python)
And use the target option in pip import as explained in Supporting multiple Python module versions (with the same version of Python)
Edited to add sample code Pandas is a large subsystem and multiple versions are a bit demanding to install. A simpler solution was to create a different environment ( I used conda) and execute a command ( .bat in windows .sh in unix)
sample SA_oldpandas.bat
echo old pandas 1.1.5
conda activate env-oldpandas
cd {code directory}
echo ---GOING----
python SA_OldPandas.py
echo ---FINISHED---- %date% %time%
start excel SA_check.xlsx
basically the old Pandas code writes the data to a excel and the new code uses this
os.getcwd()
os.system('SA_oldpandas.bat')
Upvotes: 1