Reputation: 11
I am attempting to use Tabula to pull table information from a pdf and convert it to a pandas dataframe. I have been following the steps in this tutorial:
https://aegis4048.github.io/parse-pdf-files-while-retaining-structure-with-tabula-py
When I try to load the remote PDF into my jupyter notebook with the following code (taken directly from the tutorial):
import tabula
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")
I get the error:
AttributeError: 'list' object has no attribute 'read'
I have tried to read in pdfs saved locally to my machine and I get the same error. I believe I have successfully installed Java and configured the environment variable correctly, and I have the most recent version of Tabula.
Link to screenshot from my jupyter notebook:
https://www.dropbox.com/s/y44mfzuclihfdau/S_O_Capture_1.PNG?dl=0
Thanks.
Upvotes: 1
Views: 1856
Reputation: 16072
Make sure you installed the right tabula
package!
If you ran pip3 install tabula
, then you installed an imposter!
Run pip3 uninstall tabula
to remove it, then run:
pip3 install tabula-py
to install the correct package.
Upvotes: 1