Reputation: 41
I would like to iterate through the list:
inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three']
and create a dictionary that shows all bigrams of neighboring words and the number their occurrences, while counting reversed order combinations as equal so reversed combinations and exclude same word combinations.
So ..'one', 'two'..
as well as ..'two', 'one'..
should both add to the count of ('one', 'two')
in the dictionary.
expected output:
{('one', 'two'): 3, ('one', 'three'): 2, ('two', 'three'): 1}
So far I have tried with:
import itertools
from collections import Counter
inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]
coocurences = dict(Counter(itertools.combinations(inc_list, 2)))
print(coocurences)
This obviously counts all combination possibilities while including reversed as well as same word combinations, so not what I am looking for.
Is there a tool in itertools that does something closer to my desired output?
I have found a lot of information about co-occurrence matrices, however I would prefer a dictionary as output.
Upvotes: 0
Views: 613
Reputation: 1215
As per my comment, you need to define the ordering of the pairs if you just want to add one. Here is one possibility:
from collections import Counter
inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]
bigrams = Counter()
for previous, current in zip(inc_list, inc_list[1:]):
opt1 = (f"{previous}", f"{current}")
opt2 = (f"{current}", f"{previous}")
if opt2 not in bigrams:
bigrams[opt1] += 1
continue
bigrams[opt2] += 1
coocurences = dict(bigrams)
print(coocurences)
output:
{('one', 'two'): 3, ('one', 'three'): 2, ('three', 'two'): 1}
Upvotes: 2
Reputation: 41
Thank you for the quick response and the great suggestion. I modified it a little to give me exactly what I needed.
from collections import Counter
inc_list = ['one', 'two', 'one', 'three', 'two', 'one', 'three',]
bigrams = Counter()
for previous, current in zip(inc_list, inc_list[1:]):
opt1 = f"{previous}", f"{current}"
opt2 = f"{current}", f"{previous}"
if opt2 not in bigrams:
bigrams[opt1] += 1
continue
bigrams[opt2] += 1
coocurences = dict(bigrams)
print(coocurences)
this puts out:
{('one', 'two'): 3, ('one', 'three'): 2, ('three', 'two'): 1}
Thanks :)
Upvotes: 0