Reputation: 463
I'm using Azure Databricks and trying to read an excel file. I have an encrypted file with .xlsx.pgp
. After decrypting the message I get it as a byte array. So, here's the function I use to read this file as a pandas dataframe:
df = pd.read_excel(BytesIO(orig))
However, this is giving me the following error:
XLRDError: Excel xlsx file; not supported
Now, based on this documentation:
I have added openpyxl to the cluster and then tried to run the following:
df = pd.read_excel(BytesIO(orig),engine=`openpyxl`)
I'm getting the error:
global name 'openpyxl' is not defined
With the following command, I get:
df = pd.read_excel(BytesIO(orig),engine='openpyxl')
The error I get is:
ValueError: Unknown engine: openpyxl
How can I resolve this issue?
Thanks for all the help!
Upvotes: 1
Views: 1371
Reputation: 3240
Errors suggests that, openpyxl
library is not properly installed. Also maybe notebook is not in scope of openpyxl
library.
Please install openpyxl
in Cluster which is attached to notebook as shown below:
Step1: Select Cluster and click on libraries
.
Step2: Click on Install New
.
Next click on PyPI
.
Now enter name of library that is openpyxl
Then click on Install
.
Step3: Check status of openpyxl library is installed.
Step4: Successfully installed openpyxl library.
Edit -
Note - pandas version should be 1.0.1 or above.
If pandas version is below 1.0.1, you can upgrade pandas library using pip install pandas
Check pandas version using pd.__version__
command.
For more information you can refer this answer from rama-a
Upvotes: 2