Reputation: 11
I am following the tutorial from haystacks website for Extractive QA system. I am trying to convert PDF to Text. Link to the blog is here : (https://www.deepset.ai/blog/automating-information-extraction-with-question-answering)
I pip installed haystack but I get this error. I even tried !pip install haystack.nodes but that doesn't work.
Note: I am using Google Colab for this.
Here is my detailed code and error:
!pip -q install haystack haystack.nodes
path = '/content/drive/MyDrive/Colab Notebooks/NLP/Information Extraction QA with Haystack (Adidas Financial corpus)'
from haystack.nodes import PDFToTextConverter
pdf_converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=['en'])
converted = pdf_converter.convert(file_path = path, meta = { 'company': 'Company_1', 'processed': False })
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-7-61021fb3b7b8> in <cell line: 1>()
----> 1 from haystack.nodes import PDFToTextConverter
2
3 pdf_converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=['en'])
4
5 converted = pdf_converter.convert(file_path = path, meta = { 'company': 'Company_1', 'processed': False })
Upvotes: 1
Views: 6457
Reputation: 147
I had the same issue, I had tried all the above solutions. Tried
It was very simple in my case but this did work. Turns out I was using Jupyter code blocks to install haystack in Visual studio code and I just had to restart my Editor 🫠🤦. Also just to make sure, I installed and re installed the haystack-ai
python library.
pip uninstall farm-haystack haystack-ai farm-haystack -y
pip install haystack-ai
import haystack
or any other submodule.Voila 🎆, its done.
Upvotes: 0
Reputation: 97
Note that installing farm-haystack
and haystack-ai
in the same Python environment (virtualenv, Colab, or system) causes problems. In my case, I had to enable the Telemetry environment.
These steps solved the problem for me:
!pip uninstall farm-haystack haystack-ai farm-haystack
!pip install --upgrade pip
!pip install farm-haystack[colab,ocr,preprocessing,file-conversion,pdf]
Then, I enabled the "Telemetry" environment by adding these lines at the top of my script:
from haystack.telemetry import tutorial_running
tutorial_running(8)
Upvotes: 0
Reputation: 271
To install Haystack, you need to run pip install farm-haystack
. The pypi package is called farm-haystack and not just haystack as Stefano mentioned.
A good starting point are the Haystack tutorials, which you can run as python notebooks on Google Colab, for example this tutorial using the PDFToTextConverter.
Upvotes: 1
Reputation: 109
Do not name any of your files haystack.py otherwise you will get import fails. This goes for all projects, never name any file like the library itself. ;-)
Upvotes: 0