Shiv
Shiv

Reputation: 342

Bad zip file error while using nltk pos tagger

I'm trying to use the NLTK POS-tagger, but am getting a "zipfile.BadZipfile: File is not a zip file" error.

The error comes from this code:

import nltk
sentence = "I love python"
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
print nltk.ne_chunk(pos_tags, binary=True)

I found this question related to my problem. Unfortunately I can't download the entire corpus since I'm working on a server and have a lot of memory restrictions. Can someone point me to the particular file I need so I can download just that one instead of the entire corpora?

(I'm using Python 2.7.6)

Upvotes: 1

Views: 3222

Answers (1)

alexis
alexis

Reputation: 50220

Try these:

nltk.download("maxent_treebank_pos_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("punkt")

The first two are for POS tagging and named entities, respectively. The third you're not using in your code sample, but you'll need it for nltk.sent_tokenize(), which breaks up plain text into sentences. Since you'll be working with POS tags I'd also download these (they're tiny):

nltk.download(["tagsets", "universal_tagset"])

If you do have a bit of space, downloading the entire "book" collection will give you everything you need to explore the NLTK.

Upvotes: 2

Related Questions