Reputation: 31
I am coding a probabilistic part of speech tagger in Python using the Viterbi algorithm. In this context, the Viterbi probability at time t is the product of the Viterbi path probability from the previous time step t-1, the transition probability from the previous POS tag to the current POS tag and the emission probability of the observed word given the POS tag. I am computing this by at each word in the sentence looking at each tag/state that is possible at this time (from the observed training data) and then for each possible tag computing the most likely path to this state given by the Viterbi probability explained above. A straightforward psuedo-code implementation can be found here.
One practical problem of repeatedly multiplying probabilities is that it may lead to underflow. One solution that is often proposed in the literature is to use log probabilities. As far as I understand, you should do:
current_probability = math.log(emission_probability) + math.log(transition_probability) + previous_probability
instead of
current_probability = emission_probability * transition_probability * previous_probability
given that the three probabilities are stored in the corresponding variables.
One problem I am finding hard to grasp, however, is what to do when the emission probability or the transition probability is 0. Michael Collins writes: "One issue with the new algorithm is that log 0 = −∞. Some care is required to deal with probability values equal to 0. A simple solution is to set log 0 = −B where B is a large number."
What large number should I use? -math.inf
?
Upvotes: 1
Views: 1191
Reputation: 103
Almost four years later, but maybe someone finds this useful.
I found that using the original multiplication formula combined with normalizing the probabilities after each observation processing step works best for me.
# Normalize probabilities
# s = number of states
# i = index of last processed observation
# dp = probability matrix (n x s, where n is the number of observations)
sum_probs = sum(dp[i][j] for j in range(s))
for j in range(s):
dp[i][j] /= sum_probs
Upvotes: 0
Reputation: 1
Might be a bit late, but I was facing the same problem today.
There's a specific function in scipy.stats
to handle the log of a zero probability more efficiently than just using np.log(stats.poisson().pmf())
. You can use stats.poisson.logpmf()
for that (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.logpmf.html).
I don't know how it works under the hood, but it indeed returns a large negative number (with lambda = 16 for the Poisson, it returned -832 approximately).
Upvotes: 0
Reputation: 1
one way to handle this is to add some smoothing like https://i.sstatic.net/LdNgP.png where,
local variables
word_tag : count of word in training corpus. tag_total : Number of words which tagged as given as TAG i.e.P(word|TAG)
Upvotes: 0