infr1nger
infr1nger

Reputation: 11

Python 3 nltk.data.load error

I'm trying to load english.pickle for sentence tokenization. Windows 7, Python 3.4

File followed by the path exists(tokenizers/punkt/PY3/english.pickle).

Here is the code:

import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/PY3/english.pickle')

Here is the error:

OSError: No such file or directory: 'C:\\Python\\nltk_data\\tokenizers\\punkt\\PY3\\PY3\\english.pickle'

How to fix?

Upvotes: 1

Views: 1822

Answers (1)

b3000
b3000

Reputation: 1677

The problem is that \\PY3 is doubled in your path. The nltk.data.load() method adds /PY3 to the path if it is called from python 3.

So it should work if you simply load the tokenizer with (removing /PY3 from the string):

import nltk
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

NLTK does that to allow for the possibility of programs that could be run with python 2 and 3.

Upvotes: 5

Related Questions