What should be the behavior of trigrams to predict next word, given a input size of 2 words?

Question

My bigram language model works fine when one word is given in input, but when I give two words to my trigram model, it behaves strangely and predicts 'unknown' as the next word. My code:

def get_unigram_probability(word):
  if word not in unigram:
      return 0
  return unigram[word] / total_words
    
def get_bigram_probability(words):
  if words not in bigram:
      return 0
  return bigram[words] / unigram[words[0]]
    
V = len(vocabulary)

def get_trigram_probability(words):
  if words not in trigram:
      return 0
  return trigram[words] + 1 / bigram[words[:2]] + V

for bi-gram next word prediction:

def find_next_word_bigram(words):
  candidate_list = []

  # Calculate probability for each word by looping through them
  for word in vocabulary:
    p2 = get_bigram_probability((words[-1], word))
    candidate_list.append((word, p2))
    
  # sort the list with words with often occurence in the beginning
  candidate_list.sort(key=lambda x: x[1], reverse=True)
  # print(candidate_list)
  return candidate_list[0]

for trigram:

def find_next_word_trigram(words):
  candidate_list = []

  # Calculate probability for each word by looping through them
  for word in vocabulary:
    p3 = get_trigram_probability((words[-2], words[-1], word)) if len(words) >= 3 else 0
    candidate_list.append((word, p3))
    
  # sort the list with words with often occurence in the beginning
  candidate_list.sort(key=lambda x: x[1], reverse=True)
  # print(candidate_list)
  return candidate_list[0]

I just want to know where in the code should I make changes, so that trigram would predict the next word with a given input size of 2 words.

What should be the behavior of trigrams to predict next word, given a input size of 2 words?

Answers (1)

Related Questions