Johannes Jamrosz
Johannes Jamrosz

Reputation: 41

bigram occurences to dictionary python

I would like to iterate through the list:

inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three']

and create a dictionary that shows all bigrams of neighboring words and the number their occurrences, while counting reversed order combinations as equal so reversed combinations and exclude same word combinations. So ..'one', 'two'.. as well as ..'two', 'one'.. should both add to the count of ('one', 'two') in the dictionary.

expected output:

{('one', 'two'): 3, ('one', 'three'): 2, ('two', 'three'): 1}

So far I have tried with:

import itertools
from collections import Counter

inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]

coocurences = dict(Counter(itertools.combinations(inc_list, 2)))

print(coocurences)

This obviously counts all combination possibilities while including reversed as well as same word combinations, so not what I am looking for.

Is there a tool in itertools that does something closer to my desired output?

I have found a lot of information about co-occurrence matrices, however I would prefer a dictionary as output.

Upvotes: 0

Views: 613

Answers (2)

kg_sYy
kg_sYy

Reputation: 1215

As per my comment, you need to define the ordering of the pairs if you just want to add one. Here is one possibility:

from collections import Counter

inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]

bigrams = Counter()
for previous, current in zip(inc_list, inc_list[1:]):
    opt1 = (f"{previous}", f"{current}")
    opt2 = (f"{current}", f"{previous}")
    if opt2 not in bigrams:
        bigrams[opt1] += 1
        continue
    bigrams[opt2] += 1
coocurences = dict(bigrams)
print(coocurences)

output:

{('one', 'two'): 3, ('one', 'three'): 2, ('three', 'two'): 1}

Upvotes: 2

Johannes Jamrosz
Johannes Jamrosz

Reputation: 41

Thank you for the quick response and the great suggestion. I modified it a little to give me exactly what I needed.

from collections import Counter

inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]

bigrams = Counter()
for previous, current in zip(inc_list, inc_list[1:]):
    opt1 = f"{previous}", f"{current}"
    opt2 = f"{current}", f"{previous}"
    if opt2 not in bigrams:
        bigrams[opt1] += 1
        continue
    bigrams[opt2] += 1
coocurences = dict(bigrams)
print(coocurences)

this puts out:

{('one', 'two'): 3, ('one', 'three'): 2, ('three', 'two'): 1}

Thanks :)

Upvotes: 0

Related Questions