Mangu Singh Rajpurohit
Mangu Singh Rajpurohit

Reputation: 11420

what's the difference between pos_tag and UnigramTagger and BigramTagger in nltk?

I am trying to get my hands dirty on nltk. I am referring http://victoria.lviv.ua/../NaturalLanguageProcessingWithPython.pdf. It states that nltk.pos_tag function assigns parts of speech to each word in the list of words, passed to it as argument.

Moving ahead, I found that there's also nltk.DefaultTagger, nltk.RegexpTagger, nltk.UnigramTagger and nltk.BigramTagger.

I am confused over, why we require these taggers, since nltk.pos_tag is doing good job of tagging parts of speech. Moreover, which tagger does nltk.pos_tag uses internally for tagging.

Thanks in advance.

Upvotes: 1

Views: 1450

Answers (1)

alvas
alvas

Reputation: 122148

The default nltk.pos_tag is

  • a pre-trained PerceptronTagger model
  • trained on Sections 00-18 of the Wall Street Journal sections of OntoNotes 5.

The data and walk-through documentation can be found on:


The UnigramTagger and BigramTagger are class objects that contains no pre-trained model.

Chapter 5 of the NLTK book provides an introduction POS Tagger available http://www.nltk.org/book/ch05.html:

  • DefaultTagger: Chapter 5, Section 4.1
  • RegexpTagger: Chapter 5, Section 4.2
  • NgramTagger: Chapter 5, Section 5.3

Upvotes: 2

Related Questions