George
George

Reputation: 903

python How to count how many time a word repeats sequential

There are many counter codes out there that I have stumbled across in attempting to do this but none quite right.

Given a string which repeats terms I want to group each term but I only want to group them if the repeat sequentially. For this string:

string="word, word, abc, stuff, word, stuff, stuff"

I would like to return a 'compressed' string

word(2), abc, stuff, word, stuff(2)

Note that the order needs to be preserved so I can't group by each word. The string will have every word separated by a ,\s if using regrex or string.split(',') can work.

Any thoughts on how to get a counter to count only sequential words that repeat, then how to store this information. I thought of using dict then calling the value (as the counter) and adding +1 but that didn't work as the keys repeated (i.e. there is two word entries in the above string).

Upvotes: 2

Views: 1638

Answers (3)

Eugene Soldatov
Eugene Soldatov

Reputation: 10135

You can do it without itertools too, just store last processed element of list in variable and check next element for match:

s = "word, word, abc, stuff, word, stuff, stuff"

words = []
last_word = None
for word in s.split(', '):
    if word != last_word:
        words.append([word, 1])
        last_word = word
    else:
        words[-1][1] += 1

Upvotes: 1

Abhijit
Abhijit

Reputation: 63737

itertools.groupby is the right tools to handle these sort of tasks. Generally you would need to split your string and then group based on successive repetition. Finally it is trivial to reformat the data in the manner you intend to present

>>> groups = [(k, len(list(g))) 
              for k, g in itertools.groupby(map(str.strip,   string.split(',')))]
>>> ', '.join("{}{}".format(k, ['','({})'.format(g)][g > 1]) for k, g in groups)
'word(2), abc, stuff, word, stuff(2)'

Upvotes: 4

vks
vks

Reputation: 67968

import re
x="word, word, abc, stuff, word, stuff, stuff"
print [j+"("+str(i.count(j))+")" if i.count(j)>1 else j for i,j  in re.findall(r"((\w+)(?:,\s*\2)*)",x)]

You can do this using re.

Output:['word(2)', 'abc', 'stuff', 'word', 'stuff(2)']

Upvotes: 1

Related Questions