Bad zip file error while using nltk pos tagger

Question

I'm trying to use the NLTK POS-tagger, but am getting a "zipfile.BadZipfile: File is not a zip file" error.

The error comes from this code:

import nltk
sentence = "I love python"
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
print nltk.ne_chunk(pos_tags, binary=True)

I found this question related to my problem. Unfortunately I can't download the entire corpus since I'm working on a server and have a lot of memory restrictions. Can someone point me to the particular file I need so I can download just that one instead of the entire corpora?

(I'm using Python 2.7.6)

alexis · Accepted Answer

Try these:

nltk.download("maxent_treebank_pos_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("punkt")

The first two are for POS tagging and named entities, respectively. The third you're not using in your code sample, but you'll need it for nltk.sent_tokenize(), which breaks up plain text into sentences. Since you'll be working with POS tags I'd also download these (they're tiny):

nltk.download(["tagsets", "universal_tagset"])

If you do have a bit of space, downloading the entire "book" collection will give you everything you need to explore the NLTK.

Bad zip file error while using nltk pos tagger

Answers (1)

Related Questions