Reputation: 4939
I am trying to install the nltk corpora through these commands as mentioned in the documentation -
import nltk
nltk.download()
However, I am doing this from my stupid organization which has blocked github, which is what the download function above tries to connect to.
Is there an alternate repository for the nltk data from where I can try this out? Trying to whitelist github and associated websites will only get tangled in red tape.
Thank you
Upvotes: 5
Views: 2679
Reputation: 50220
The layout of the nltk data is pretty straightforward. Run nltk.download()
on a computer that has access to github, download the resources you are interested in (if you don't know yet, I recommend the "book" bundle), then find the generated nltk_data
folder and just copy the hierarchy to your work computer at a location where the nltk can find it. (E.g., see where the downloader tried to install it).
Upvotes: 0
Reputation: 535
Due to issue 1787 , I started building RPMs in openSUSE Build Service (OBS) repository home:jayvdb:nltk_data.
For example, for the punkt
data, the .spec
file is here. It is very easy to copy that for other data packs.
To install from OBS on Fedora Rawhide:
dnf config-manager --add-repo http://download.opensuse.org/repositories/home:jayvdb:nltk_data/Fedora_Rawhide/home:jayvdb:nltk_data.repo
dnf install nltk-data-punkt
More download instructions available from the OBS download page.
Upvotes: 0
Reputation: 535
There was a brief period when GitHub actually blocked all fetches of nltk_data
, resulting in issue 1787 which is still open and contains many workarounds, and plans to avoid relying on GitHub hosting.
The current 'official' answer is:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
Upvotes: 0
Reputation: 6298
You can try downloading the Arch Linux package for nltk, which contains all the files you need.
usr/share/nltk_data
.nltk_data
folder to the appropriate path on your machine.Upvotes: 1