Reputation: 279
I am trying to extract data from an invoice. I found invoice2data will do that job. I have pip installed invoice2data.
from invoice2data import extract_data
This is getting imported.
result = extract_data('sample.pdf')
When I run the above line it is showing that
OSError: pdftotext not installed. Can be downloaded from https://poppler.freedesktop.org/
When I try pip installing pdftotext it was showing virtual c++ 14.0 is required.I installed it using build tools. Again it was showing the same error. So I downloaded the files from https://pypi.org/project/pdftotext/ and pasted the extracted files in my anaconda/Lib/sitepackages directory. Now when I try to pip install pdftotext it is showing Requirement already satisfied: pdftotext in c:\users\vicky\anaconda3\lib\site-packages (2.1.2)
Now, when I try to extract data from the pdf it is again showing the same error that pdftotext is not installed. How can I overcome this error or is there any other package that will satisfy my requirement?
Thanks in advance.
Upvotes: 0
Views: 4896
Reputation: 763
the dep are here https://pypi.org/project/pdftotext/ just have a look and instal them before pip install pdf2text
Upvotes: 0
Reputation: 75
some simple steps to do, which worked for me...
1. download and install Visual Studio with C++ Build Tools, as Microsoft Visual C++ is required. https://visualstudio.microsoft.com/downloads/
2. Download the latest binaries of Popplers for Windows https://blog.alivate.com.au/poppler-windows/index.html
3. Extract and copy the 'poppler' folder which is inside the folder 'include'
4. Past this 'poppler' folder inside the 'Anaconda3/include/' folder
5. Then run 'pip install pdftotext
'
YOU ARE DONE!!!
Upvotes: -1
Reputation: 5600
Install poppler-utils before pdftotext
sudo apt-get install poppler-utils
Upvotes: 0