what's the difference between pos_tag and UnigramTagger and BigramTagger in nltk?

Question

I am trying to get my hands dirty on nltk. I am referring http://victoria.lviv.ua/../NaturalLanguageProcessingWithPython.pdf. It states that nltk.pos_tag function assigns parts of speech to each word in the list of words, passed to it as argument.

Moving ahead, I found that there's also nltk.DefaultTagger, nltk.RegexpTagger, nltk.UnigramTagger and nltk.BigramTagger.

I am confused over, why we require these taggers, since nltk.pos_tag is doing good job of tagging parts of speech. Moreover, which tagger does nltk.pos_tag uses internally for tagging.

Thanks in advance.

alvas · Accepted Answer

The default nltk.pos_tag is

a pre-trained PerceptronTagger model
trained on Sections 00-18 of the Wall Street Journal sections of OntoNotes 5.

The data and walk-through documentation can be found on:

Data: https://catalog.ldc.upenn.edu/ldc2013t19
Algorithm: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

The UnigramTagger and BigramTagger are class objects that contains no pre-trained model.

Chapter 5 of the NLTK book provides an introduction POS Tagger available http://www.nltk.org/book/ch05.html:

DefaultTagger: Chapter 5, Section 4.1
RegexpTagger: Chapter 5, Section 4.2
NgramTagger: Chapter 5, Section 5.3

what's the difference between pos_tag and UnigramTagger and BigramTagger in nltk?

Answers (1)

Related Questions

what&#39;s the difference between pos_tag and UnigramTagger and BigramTagger in nltk?

Answers (1)

Related Questions

what's the difference between pos_tag and UnigramTagger and BigramTagger in nltk?