Reputation: 273
I'm trying to get NLTK and wordnet working on Heroku. I've already done
heroku run python
nltk.download()
wordnet
pip install -r requirements.txt
But I get this error:
Resource 'corpora/wordnet' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
Yet, I've looked at in /app/nltk_data and it's there, so I'm not sure what's going on.
Upvotes: 26
Views: 64550
Reputation: 1
"I had the same issue, and then I noticed that these two libraries don't unzip automatically; we have to unzip them manually."
Upvotes: 0
Reputation: 1
** This answer does not belong to me. I found it somewhere on the internet** unzip the wordnet.zip file manually using following code !unzip /usr/share/nltk_data/corpora/wordnet.zip -d /usr/share/nltk_data/corpora/
You may check these notebook for reference :- https://www.kaggle.com/code/shivanimalhotra91/nlp-using-glove-embeddings or https://www.kaggle.com/code/shivanimalhotra91/gensim-word2vec-lstm-95-accuracy
Upvotes: 0
Reputation: 29
in my case after running
import nltk
nltk.download('wordnet')
it did not work. The issue was wordnet.zip was unabale to unzip in its own so simple go to folder wherepython3 -m textblob.download_corpora
this command installed package and unzip folder
cd ~
cd nltk_data/corpora/
unzip wordnet.zip
Upvotes: 2
Reputation: 838
On Mac:
I still needed to download the omw-1.4
data. The code was running from an Python file and the nltk_data/
directory is in the same directory like the Python file.
nltk.download('wordnet', "nltk_data/")
nltk.download('omw-1.4', "nltk_data/")
nltk.data.path.append('nltk_data/')
Upvotes: 0
Reputation: 37
I was also facing the issue when i tried to use this code lemmatizer.lemmatize('goes'), its actually because of packages they have not downloaded. so try to download them using following code, may be it can solve many problems regarding to these,
nltk.download('wordnet') nltk.download('omw-1.4')
Thank You..
Upvotes: 0
Reputation: 486
I faced the same problem and I tried this solution and it is works. I just did put these:
import nltk
nltk.download('wordnet')
in the above code and it is run without problem. so try it maybe help you.
Upvotes: 1
Reputation:
I faced the same error. This workaround by Fred Foo helped me to fix the issue The following works for me:
# 1) execute the below written code
# 2) a NLTK Download window will open
# 3) select "Corpora" tab and scroll down until "wordnet"
# 4) doubleclick to install
nltk.download()
from nltk.corpus import wordnet
Upvotes: 0
Reputation: 11
I know this is an old question, but since the "right" answer has changed thanks to Heroku offering support for nltk
, I thought it might be worthwhile to answer.
Heroku now supports nltk
. If you need to download something for nltk
(wordnet in this example, or perhaps stopwords or a corpora), you can do so by simply including an nltk.txt
file in the same root directory where you have your Procfile
and requirements.txt
. In your nltk.txt
file you list each item you would like to download. For a project I just deployed I needed stopwords and wordnet, so my nltk.txt
looks like this:
stopwords
wordnet
Pretty straightforward. And, of course, make sure you have the appropriate version of nltk
specified in your Pipfile
or requirements.txt
. For the ground truth, visit https://devcenter.heroku.com/articles/python-nltk.
Upvotes: 1
Reputation: 31
I faced the exact same problem while deploying a chatbot on Heroku platform. Although the answer from follyroof is a fool-proof solution, but in many cases, the size of the repository would be increased drastically.
So, I used the nltk.download('PACKAGE') in my app.py file. This way whenever app.py is run, the dependencies are automatically downloaded.
Upvotes: 0
Reputation: 13292
This one works:
For Mac OS users.
python -m nltk.downloader -d /usr/local/share/nltk_data wordnet
Upvotes: 2
Reputation: 880
As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txt
file to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltk for details.
Here's a cleaner solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.
I used similar steps to install Textblob on Heroku, which uses NLTK as a dependency. I've made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.
The default heroku buildpack includes a post_compile
step that runs after all of the default build steps have been completed:
# post_compile
#!/usr/bin/env bash
if [ -f bin/post_compile ]; then
echo "-----> Running post-compile hook"
chmod +x bin/post_compile
sub-env bin/post_compile
fi
As you can see, it looks in your project directory for your own post_compile
file in the bin
directory, and it runs it if it exists. You can use this hook to install the nltk data.
Create the bin
directory in the root of your local project.
Add your own post_compile
file to the bin
directory.
# bin/post_compile
#!/usr/bin/env bash
if [ -f bin/install_nltk_data ]; then
echo "-----> Running install_nltk_data"
chmod +x bin/install_nltk_data
bin/install_nltk_data
fi
echo "-----> Post-compile done"
Add your own install_nltk_data
file to the bin
directory.
# bin/install_nltk_data
#!/usr/bin/env bash
source $BIN_DIR/utils
echo "-----> Starting nltk data installation"
# Assumes NLTK_DATA environment variable is already set
# $ heroku config:set NLTK_DATA='/app/nltk_data'
# Install the nltk data
# NOTE: The following command installs the wordnet corpora,
# so you may want to change for your specific needs.
# See http://www.nltk.org/data.html
python -m nltk.downloader wordnet
# If using Textblob, use this instead:
# python -m textblob.download_corpora lite
# Open the NLTK_DATA directory
cd ${NLTK_DATA}
# Delete all of the zip files
find . -name "*.zip" -type f -delete
echo "-----> Finished nltk data installation"
Add nltk
to your requirements.txt
file (Or textblob
if you are using Textblob).
Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
$ heroku config:set NLTK_DATA='/app/nltk_data'
Deploy to Heroku. You will see the post_compile
step trigger at the end of the deployment, followed by the nltk download.
I hope you found this helpful! Enjoy!
Upvotes: 6
Reputation: 8845
Heroku now officially supports NLTK data, built-in!
https://devcenter.heroku.com/articles/python-nltk
Upvotes: 1
Reputation: 3530
I just had this same problem. What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. You can do this all locally and then push the changes to Heroku.
So, supposing my python application is in a directory called "myapp/"
Step 1: Create the directory
cd myapp/
mkdir nltk_data
Step 2: Download Corpus to New Directory
python -m nltk.downloader
This'll pop up the nltk
downloader. Set your Download Directory to whatever_the_absolute_path_to_myapp_is/nltk_data/
. If you're using the GUI downloader, the download directory is set through a text field on the bottom of the UI. If you're using the command line one, you set it in the config menu.
Once the downloader knows to point to your newly created nltk_data
directory, download your corpus.
Or in one step from Python code:
nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")
Step 3: Let nltk Know Where to Look
ntlk
looks for data,resources,etc. in the locations specified in the nltk.data.path
variable. All you need to do is add nltk.data.path.append('./nltk_data/')
to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.
Step 4: Send it to Heroku
git add nltk_data/
git commit -m 'super useful commit message'
git push heroku master
That should work! It did for me anyway. One thing worth noting is that the path from the python file executing nltk stuff to the nltk_data directory may be different depending on how you've structured your application, so just account for that when you do nltk.data.path.append('path_to_nltk_data')
Upvotes: 65
Reputation: 5139
For Mac OS user only.
python -m nltk.downloader -d /usr/share/nltk_data wordnet
the corpora data can't be downloaded directly to the /usr/share/nltk_data
folder. error reports "no permission", two solutions:
Add additional permission change to the Mac system, details refer to Operation Not Permitted when on root El capitan (rootless disabled) . However, I don't want to change to mac default setting just for this corpora. and I go for the second solution.
Add path to nltk path. In py file, add following lines:
import nltk
nltk.data.path.append('nltk_data')
Upvotes: 5
Reputation: 133
I was getting this issue. For those who are not working in virtual environment, will need to download to following directory in ubuntu:
/usr/share/nltk_data/corpora/wordnet
Instead of wordnet it could be brown or whatever. You can directly run this command in your terminal if you want to download the corpus.
$ sudo python -m nltk.downloader -d /usr/share/nltk_data wordnet
Again instead of wordnet it could be brown.
Upvotes: 2