jukebox
jukebox

Reputation: 463

XLRDError: Excel xlsx file; not supported Databricks

I'm using Azure Databricks and trying to read an excel file. I have an encrypted file with .xlsx.pgp. After decrypting the message I get it as a byte array. So, here's the function I use to read this file as a pandas dataframe:

df = pd.read_excel(BytesIO(orig))

However, this is giving me the following error:

XLRDError: Excel xlsx file; not supported

Now, based on this documentation:

I have added openpyxl to the cluster and then tried to run the following:

df = pd.read_excel(BytesIO(orig),engine=`openpyxl`)

I'm getting the error:

global name 'openpyxl' is not defined

With the following command, I get:

df = pd.read_excel(BytesIO(orig),engine='openpyxl')

The error I get is:

ValueError: Unknown engine: openpyxl

How can I resolve this issue?

Thanks for all the help!

Upvotes: 1

Views: 1371

Answers (1)

Abhishek Khandave
Abhishek Khandave

Reputation: 3240

Errors suggests that, openpyxl library is not properly installed. Also maybe notebook is not in scope of openpyxl library.

Please install openpyxl in Cluster which is attached to notebook as shown below:

Step1: Select Cluster and click on libraries. enter image description here

Step2: Click on Install New.

Next click on PyPI.

Now enter name of library that is openpyxl

Then click on Install.

enter image description here

Step3: Check status of openpyxl library is installed.

enter image description here

Step4: Successfully installed openpyxl library.

enter image description here


Edit -

Note - pandas version should be 1.0.1 or above.

If pandas version is below 1.0.1, you can upgrade pandas library using pip install pandas

Check pandas version using pd.__version__ command.

For more information you can refer this answer from rama-a

Upvotes: 2

Related Questions