ajg2356
ajg2356

Reputation: 11

What is causing AttributeError: 'list' object has no attribute 'read' when tying to read in a pdf with Tabula?

I am attempting to use Tabula to pull table information from a pdf and convert it to a pandas dataframe. I have been following the steps in this tutorial:

https://aegis4048.github.io/parse-pdf-files-while-retaining-structure-with-tabula-py

When I try to load the remote PDF into my jupyter notebook with the following code (taken directly from the tutorial):

import tabula
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

I get the error:

AttributeError: 'list' object has no attribute 'read'

I have tried to read in pdfs saved locally to my machine and I get the same error. I believe I have successfully installed Java and configured the environment variable correctly, and I have the most recent version of Tabula.

Link to screenshot from my jupyter notebook:

https://www.dropbox.com/s/y44mfzuclihfdau/S_O_Capture_1.PNG?dl=0

Thanks.

Upvotes: 1

Views: 1856

Answers (1)

Lord Elrond
Lord Elrond

Reputation: 16072

Make sure you installed the right tabula package!

If you ran pip3 install tabula, then you installed an imposter!

Run pip3 uninstall tabula to remove it, then run:

pip3 install tabula-py

to install the correct package.

Upvotes: 1

Related Questions