Mark Grey
Mark Grey

Reputation: 10257

What is the default nltk part of speech tagset?

While experimenting with NLTK part of speech tagging, I noticed a lot of VBP tags in the output of my calls to nltk.pos_tag. I noticed this tag is not in the Brown Corpus part of speech tagset. It is however a part of the UPenn tagset.

What tagset does nltk use by default? I can't find this in the official documentation or the apidocs.

Upvotes: 8

Views: 4349

Answers (3)

Simone
Simone

Reputation: 615

NLTK uses the Penn Treebank tagset as default. Others are available. Here a list of other taggers (with other tagsets) available as part of the NLTK library.

Upvotes: 0

Mayank Gour
Mayank Gour

Reputation: 136

It use POS tags used in the Penn Treebank Project. You can see the list of tags with there meaning on "http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"

Upvotes: 5

Chandan Gupta
Chandan Gupta

Reputation: 1480

Ntlk uses PennTreebank tagset . Have a look at this link http://nltk.org/api/nltk.tag.html

Upvotes: 8

Related Questions