Alibaba17
Alibaba17

Reputation: 39

Getting the bigram probability (python)

I am trying to write a function that calculates the bigram probability.

So, I basically have to calculate the occurence of two consective words (e.d. I am) in a corpus and divide that by the first word of those two words.

In formula it is:

P(W_n-1, W_n) / P(W_n-1)

So in my code I am trying to do something like:

def prob(self, prevWord, word):
    word = word.strip()
    prevWord = prevWord.strip()
    for sen in corpus:
        for word in sen:
            if(word occurs after prevWord): #Pseudocode here
                  counter++
    numerator = counter / self.total
    prevWordProb = self.counts[prevWord]/self.total
    return numerator / prevWordProb

First of all, is my approach valid? If so, I am not sure how to code the

if(word occurs after prevWord): #Pseudocode here

part of the code. How will it look like?

Upvotes: 1

Views: 4873

Answers (1)

Elliot Roberts
Elliot Roberts

Reputation: 940

There are a few other issues with the code, but if resolved, the loop and conditional should look something like:

for sen in corpus:
    for i, w in enumerate(sen):
        if w == prevWord and sen[i+1] == word:
            counter++

Upvotes: 1

Related Questions