user2604504
user2604504

Reputation: 717

Part of speech tagging with Viterbi algorithm

I am working on a project where I need to use the Viterbi algorithm to do part of speech tagging on a list of sentences. For my training data I have sentences that are already tagged by word that I assume I need to parse and store in some data structure. Then I have a test data which also contains sentences where each word is tagged.

I'm a bit confused on how I would approach this problem. I guess part of the issue stems from the fact that I don't think I fully understand the point of the Viterbi algorithm. Am I supposed to use the Viterbi algorithm to tag my test data and compare the results to the actual data? What data structures are best to do this and represent a sentence?

Any help would be greatly appreciated.

Upvotes: 1

Views: 4228

Answers (1)

aerin
aerin

Reputation: 22724

Viterbi algorithm is not to tag your data. You should have manually (or semi-automatically by the state-of-the-art parser) tagged data for training.

Viterbi is used to calculate the best path to a node and to find the path to each node with the lowest negative log probability.

Python implementation of HMM (Viterbi) POS Tagger: https://github.com/zachguo/HMM-Trigram-Tagger/blob/master/HMM.py

Upvotes: 2

Related Questions