Hash
Hash

Reputation: 95

ImportError: cannot import name PunktWordTokenizer

I was trying to use PunktWordTokenizer and it was occurred an error as below.

from nltk.tokenize.punkt import PunktWordTokenizer

And this gave the following error message.

Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer

I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.

Upvotes: 5

Views: 7659

Answers (2)

Vivek Puurkayastha
Vivek Puurkayastha

Reputation: 536

PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.

from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)

The difference is :

PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.

For example, given Input: This’s a test

PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]

Upvotes: 4

Shubham R
Shubham R

Reputation: 7644

There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.

>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer

For solving this Try pip install -U nltk to upgrade your NLTK version.

Upvotes: 2

Related Questions