Reputation: 31

Log probability in the Viterbi algorithm (handling zero probabilities)

I am coding a probabilistic part of speech tagger in Python using the Viterbi algorithm. In this context, the Viterbi probability at time t is the product of the Viterbi path probability from the previous time step t-1, the transition probability from the previous POS tag to the current POS tag and the emission probability of the observed word given the POS tag. I am computing this by at each word in the sentence looking at each tag/state that is possible at this time (from the observed training data) and then for each possible tag computing the most likely path to this state given by the Viterbi probability explained above. A straightforward psuedo-code implementation can be found here.

One practical problem of repeatedly multiplying probabilities is that it may lead to underflow. One solution that is often proposed in the literature is to use log probabilities. As far as I understand, you should do:

current_probability = math.log(emission_probability) + math.log(transition_probability) + previous_probability

instead of

current_probability = emission_probability * transition_probability * previous_probability

given that the three probabilities are stored in the corresponding variables.

One problem I am finding hard to grasp, however, is what to do when the emission probability or the transition probability is 0. Michael Collins writes: "One issue with the new algorithm is that log 0 = −∞. Some care is required to deal with probability values equal to 0. A simple solution is to set log 0 = −B where B is a large number."

What large number should I use? -math.inf?

Upvotes: 1

Answers (3)

afp

Reputation: 103

Almost four years later, but maybe someone finds this useful.

I found that using the original multiplication formula combined with normalizing the probabilities after each observation processing step works best for me.

# Normalize probabilities
# s = number of states
# i = index of last processed observation
# dp = probability matrix (n x s, where n is the number of observations)
sum_probs = sum(dp[i][j] for j in range(s))

for j in range(s):
    dp[i][j] /= sum_probs

Upvotes: 0

TimC

Reputation: 1

Might be a bit late, but I was facing the same problem today.

There's a specific function in scipy.stats to handle the log of a zero probability more efficiently than just using np.log(stats.poisson().pmf()). You can use stats.poisson.logpmf() for that (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.logpmf.html).

I don't know how it works under the hood, but it indeed returns a large negative number (with lambda = 16 for the Poisson, it returned -832 approximately).

Upvotes: 0

Brute-Force

Reputation: 1

one way to handle this is to add some smoothing like https://i.sstatic.net/LdNgP.png where,

local variables

word_tag : count of word in training corpus. tag_total : Number of words which tagged as given as TAG i.e.P(word|TAG)

Upvotes: 0

Log probability in the Viterbi algorithm (handling zero probabilities)

Answers (3)

Related Questions