Mortz
Mortz

Reputation: 4939

Alternative source for nltk data

I am trying to install the nltk corpora through these commands as mentioned in the documentation -

import nltk
nltk.download()

However, I am doing this from my stupid organization which has blocked github, which is what the download function above tries to connect to.

Is there an alternate repository for the nltk data from where I can try this out? Trying to whitelist github and associated websites will only get tangled in red tape.

Thank you

Upvotes: 5

Views: 2679

Answers (4)

alexis
alexis

Reputation: 50220

The layout of the nltk data is pretty straightforward. Run nltk.download() on a computer that has access to github, download the resources you are interested in (if you don't know yet, I recommend the "book" bundle), then find the generated nltk_data folder and just copy the hierarchy to your work computer at a location where the nltk can find it. (E.g., see where the downloader tried to install it).

Upvotes: 0

John Vandenberg
John Vandenberg

Reputation: 535

Due to issue 1787 , I started building RPMs in openSUSE Build Service (OBS) repository home:jayvdb:nltk_data.

For example, for the punkt data, the .spec file is here. It is very easy to copy that for other data packs.

To install from OBS on Fedora Rawhide:

dnf config-manager --add-repo http://download.opensuse.org/repositories/home:jayvdb:nltk_data/Fedora_Rawhide/home:jayvdb:nltk_data.repo
dnf install nltk-data-punkt

More download instructions available from the OBS download page.

Upvotes: 0

John Vandenberg
John Vandenberg

Reputation: 535

There was a brief period when GitHub actually blocked all fetches of nltk_data, resulting in issue 1787 which is still open and contains many workarounds, and plans to avoid relying on GitHub hosting.

The current 'official' answer is:

PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA

Upvotes: 0

m00am
m00am

Reputation: 6298

You can try downloading the Arch Linux package for nltk, which contains all the files you need.

  1. Download the package from Archlinux packages website, using the Download from Mirror link in the Package Actions box on the right, or you can just use this link.
  2. Extract the file (it is an xzipped tar archive). I used ark on linux, not sure what is the appropriate software for your system (on windows 7zip and winrar should be able to handle this).
  3. You find the files in the folder usr/share/nltk_data.
  4. Move the nltk_data folder to the appropriate path on your machine.

Upvotes: 1

Related Questions