Frequency and next words for a word of a bigram list in python

Question

I have this sentence: 'Johnny Johnny yes papa', and I want to calculate the frequency of next word for each word. In this case I turn the sentence into circular:

sentence = 'Johnny Johnny yes papa'
sentence = sentence.split()
sentence.append(sentence[0])

Now the sentence is ['Johnny','Johnny','yes','papa','Johnny']

I create the bigrams in this way:

def to_bigrams(my_list):
  bigrams = [(my_list[i],my_list[i+1]) for i,element in enumerate(my_list) if i


And now my bigrams are: [('Johnny', 'Johnny'), ('Johnny', 'yes'), ('yes', 'papa'), ('papa', 'Johnny')]
Now for example Johnny has two outcomes: Johnny and yes, and yes has only one outcome which is papa and papa has only one outcome which is Johnny so the expected dictionary is:
{'Johnny':['Johnny','yes'],'yes':['papa'],'papa':['Johnny']}

I have tried this:
my_freq_dict = {my_bigrams[i][0]:my_bigrams[i][j] for i,element in enumerate(my_bigrams) for j in range(len(my_bigrams))}

But I get this error: IndexError: tuple index out of range. There is something wrong with my logic, please, could you help me?

Chris · Accepted Answer

One way using dict.setdefault:

my_bigrams = [('Johnny', 'Johnny'), ('Johnny', 'yes'), ('yes', 'papa'), ('papa', 'Johnny')]

d = {}
for v1, v2 in my_bigrams:
    d.setdefault(v1, []).append(v2)
d

Output:

{'Johnny': ['Johnny', 'yes'], 'yes': ['papa'], 'papa': ['Johnny']}

Your try is creating error because you are using len(my_bigrams) instead of len(element).

Fixing it, however, won't yield the expected output since some keys appear more than once and thus will be overwritten by the latest entry (which is what dict is meant to do).

Frequency and next words for a word of a bigram list in python

Answers (2)

Related Questions