How to append values to generator while using bigrams in conditionalFreqDist method in python?

Question

Context: I'm using NLTK to generate bigram probabilities. I have a corpus from which I have generated bigrams. -> 'wordPairsBigram' refers to the bigram from the corpus. I have a sentence "The company chairman said he will increase the profit next year". -> 'wordPairSentence' refers to the bigrams in the above sentence.

The Problem: I need to generate bigram probabilities. For that I need to find conditional Frequency Distribution of the sample sentence which I will pass onto the ConditionalProbDist function. I have the following code which calculates the conditional Frequency of the bigrams of the sentence that are available in the corpus.

fdListSentence1 = ConditionalFreqDist(wordBigram for wordBigram in wordPairsBigram if wordBigram in wordPairSentence1 )
print fdListSentence1.tabulate()

output:
        company   he said will year
     The    8    0    0    0    0
chairman    0    0    7    0    0
      he    0    0    0    2    0
    next    0    0    0    0    5
    said    0   21    0    0    0

The issue The code works fine for all the bigrams that are available in the corpus and the sample sentence. There are a few bigrams that are there in Sample sentence but not there in the corpus. They dont get included while calculating the frequency distribution.

What I want? I want the frequency distribution for the bigrams in the sentence. If the bigram in the sentence is not there in corpus bigram, I want a value 0 while tabulating.

Any help is appreciated. I dont know how to include what I want in the code.

How to append values to generator while using bigrams in conditionalFreqDist method in python?

Answers (1)

Related Questions