Reputation: 733
I have begun using the Python library textract
to parse text from PowerPoint (.pptx), Word documents (.docx), and text files (*.txt). I wrote a simple script to test it.
# Python textract test script
import textract
textract.process("H:\My Documents\Test.docx")
When I run it, either on the command line or in Idle, I get a traceback with the last few lines being:
File: "C:...\textract\parsers\docx_parser.py", line 1 in import docx2txt ImportError: No module named docx2txt
I am using version 1.5.0, downloaded from https://pypi.python.org/pypi/textract. I don't know why it would not include any dependencies. Will I have to install docx2txt
and its subsequent dependencies? Why would the textract
package not contain everything I need?
Upvotes: 1
Views: 8566
Reputation: 39
This worked for me,
open the terminal and then type them as below,
python -m venv env
source ./env/bin/activate
sudo apt update
sudo apt install python-pip && pip install --upgrade pip
sudo apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig
pip install textract
if you face any errors try them below
pip install https://pypi.python.org/packages/ce/c7/ab6cd0d00ddf8dc3b537cfb922f3f049f8018f38c88d71fd164f3acb8416/SpeechRecognition-3.6.3-py2.py3-none-any.whl
sudo apt install libpulse-dev
pip install textract
Upvotes: 1
Reputation: 28893
textract
does not automatically install the dependencies for all the file types it supports. You selectively install the ones you're interested in.
While this is not as elegant as one might imagine it could be, it's the appropriate design choice here I think. Python doesn't have the ability to install dependencies on-demand, so the only alternative would be for textract
to install all the dozen or more possible dependencies, which would tend to bloat your Python environment.
So in this case, as Kashyap mentions, the appropriate action is:
pip install python-docx
and similar for any other file type dependencies you might need.
Upvotes: 0
Reputation: 17441
I would recommend using pip install xxx
to install the module. That'll install it in the path that's usually looked up by python. It should also take care of dependencies.
If you did manual installation or just extracted it to dinner folder then Set your path correctly, like described here How to add to the pythonpath in windows 7? or Python - PYTHONPATH in linux
If you think you've set it correctly then then post it's value, pwd etc.
Upvotes: 0