MyTivoli
MyTivoli

Reputation: 125

Python count tuples occurence in list

Is there a way to count how many times each tuple occurs in this list of tokens?

I have tried the count method but it does not work.

This is the list:

['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay']

These are the tuples based on the list:

('hello', 'how')
('how', 'are')
('are','you')
('you', 'doing')
('doing', 'today')
('today', 'are')
('you', 'okay')

I would like the result to be something like this

('hello', 'how')1
('how', 'are')1
('are','you')2
('you', 'doing')1
('doing', 'today')1
('today', 'are')1
('you', 'okay')1

Upvotes: 4

Views: 4907

Answers (2)

MSeifert
MSeifert

Reputation: 152677

This solution requires a third-party module (iteration_utilities.Iterable) but should do what you want:

>>> from iteration_utilities import Iterable

>>> l = ['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay']

>>> Iterable(l).successive(2).as_counter()
Counter({('are', 'you'): 2,
         ('doing', 'today'): 1,
         ('hello', 'how'): 1,
         ('how', 'are'): 1,
         ('today', 'are'): 1,
         ('you', 'doing'): 1,
         ('you', 'okay'): 1})

Upvotes: 6

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

You can easily use a Counter for that. A generic function to count n-grams is the following:

from collections import Counter
from itertools import islice

def count_ngrams(iterable,n=2):
    return Counter(zip(*[islice(iterable,i,None) for i in range(n)]))

This generates:

>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],2)
Counter({('are', 'you'): 2, ('doing', 'today'): 1, ('you', 'doing'): 1, ('you', 'okay'): 1, ('today', 'are'): 1, ('how', 'are'): 1, ('hello', 'how'): 1})
>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],3)
Counter({('are', 'you', 'okay'): 1, ('you', 'doing', 'today'): 1, ('are', 'you', 'doing'): 1, ('today', 'are', 'you'): 1, ('how', 'are', 'you'): 1, ('doing', 'today', 'are'): 1, ('hello', 'how', 'are'): 1})
>>> count_ngrams(['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'],4)
Counter({('doing', 'today', 'are', 'you'): 1, ('today', 'are', 'you', 'okay'): 1, ('are', 'you', 'doing', 'today'): 1, ('how', 'are', 'you', 'doing'): 1, ('you', 'doing', 'today', 'are'): 1, ('hello', 'how', 'are', 'you'): 1})

Upvotes: 6

Related Questions