Reputation: 39
I am trying to write a function that calculates the bigram probability.
So, I basically have to calculate the occurence of two consective words (e.d. I am) in a corpus and divide that by the first word of those two words.
In formula it is:
P(W_n-1, W_n) / P(W_n-1)
So in my code I am trying to do something like:
def prob(self, prevWord, word):
word = word.strip()
prevWord = prevWord.strip()
for sen in corpus:
for word in sen:
if(word occurs after prevWord): #Pseudocode here
counter++
numerator = counter / self.total
prevWordProb = self.counts[prevWord]/self.total
return numerator / prevWordProb
First of all, is my approach valid? If so, I am not sure how to code the
if(word occurs after prevWord): #Pseudocode here
part of the code. How will it look like?
Upvotes: 1
Views: 4873
Reputation: 940
There are a few other issues with the code, but if resolved, the loop and conditional should look something like:
for sen in corpus:
for i, w in enumerate(sen):
if w == prevWord and sen[i+1] == word:
counter++
Upvotes: 1