Reputation: 51
I was looking for a precise step by step example of the Viterbi algorithm.
Considering sentence tagging with the input sentence as:
The cat saw the angry dog jump
And from this I would like to generate the most probable output as:
D N V T A N V
How do we use the Viterbi algorithm to get the above output using a trigram-HMM?
(PS: I'm looking for a precise step by step explanation, not a piece of code, or math representation. Assume all probabilities as numbers.)
Thanks a ton!
Upvotes: 2
Views: 1992
Reputation: 22724
For Viterbi algorithm and Hidden Markov Model, you first need the transition probability and emission probability.
In your example, the transition probability is P(D->N), P(N->V) and the emission probability (assuming bigram model) is P(D|the), P(N|cat).
Of course, in real world example, there are a lot more word than the, cat, saw, etc. You have to loop through all your training data to have estimate of P(D|the), P(N|cat), P(N|car). Then we use Viterbi algorithm to find the most likely sequence of tags such as
D N V T A N V
given your observation.
Here is my implementation of Viterbi.
def viterbi(vocab, vocab_tag, words, tags, t_bigram_count, t_unigram_count, e_bigram_count, e_unigram_count, ADD_K):
vocab_size = len(vocab)
V = [{}]
for t in vocab_tag:
# Prob of very first word
prob = np.log2(float(e_bigram_count.get((words[0],t),0)+ADD_K))-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
# trigram V[0][0]
V[0][t] = {"prob": prob, "prev": None}
for i in range(1,len(words)):
V.append({})
for t in vocab_tag:
V[i][t] = {"prob": np.log2(0), "prev": None}
for t in vocab_tag:
max_trans_prob = np.log2(0);
for prev_tag in vocab_tag:
trans_prob = np.log2(float(t_bigram_count.get((t, prev_tag),0)+ADD_K))-np.log2(float(t_unigram_count[prev_tag]+vocab_size*ADD_K))
if V[i-1][prev_tag]["prob"]+trans_prob > max_trans_prob:
max_trans_prob = V[i-1][prev_tag]["prob"]+trans_prob
max_prob = max_trans_prob+np.log2(e_bigram_count.get((words[i],t),0)+ADD_K)-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
V[i][t] = {"prob": max_prob, "prev": prev_tag}
opt = []
previous = None
max_prob = max(value["prob"] for value in V[-1].values())
# Get most probable state and its backtrack
for st, data in V[-1].items():
if data["prob"] == max_prob:
opt.append(st)
previous = st
break
for t in range(len(V) - 2, -1, -1):
opt.insert(0, V[t + 1][previous]["prev"])
previous = V[t][previous]["prev"]
return opt
Upvotes: 2
Reputation: 3744
I suggest that you look it up in one of the books available, e.g. Chris Bishop "Pattern Recognition and Machine Learning". Viterbi algorithm is a really basic thing and has been described in various levels of detail in the literature.
Upvotes: 1