Reputation: 903
There are many counter codes out there that I have stumbled across in attempting to do this but none quite right.
Given a string which repeats terms I want to group each term but I only want to group them if the repeat sequentially. For this string:
string="word, word, abc, stuff, word, stuff, stuff"
I would like to return a 'compressed' string
word(2), abc, stuff, word, stuff(2)
Note that the order needs to be preserved so I can't group by each word. The string will have every word separated by a ,\s
if using regrex or string.split(',')
can work.
Any thoughts on how to get a counter to count only sequential words that repeat, then how to store this information. I thought of using dict then calling the value (as the counter) and adding +1 but that didn't work as the keys repeated (i.e. there is two word entries in the above string).
Upvotes: 2
Views: 1638
Reputation: 10135
You can do it without itertools
too, just store last processed element of list in variable and check next element for match:
s = "word, word, abc, stuff, word, stuff, stuff"
words = []
last_word = None
for word in s.split(', '):
if word != last_word:
words.append([word, 1])
last_word = word
else:
words[-1][1] += 1
Upvotes: 1
Reputation: 63737
itertools.groupby is the right tools to handle these sort of tasks. Generally you would need to split your string and then group based on successive repetition. Finally it is trivial to reformat the data in the manner you intend to present
>>> groups = [(k, len(list(g)))
for k, g in itertools.groupby(map(str.strip, string.split(',')))]
>>> ', '.join("{}{}".format(k, ['','({})'.format(g)][g > 1]) for k, g in groups)
'word(2), abc, stuff, word, stuff(2)'
Upvotes: 4
Reputation: 67968
import re
x="word, word, abc, stuff, word, stuff, stuff"
print [j+"("+str(i.count(j))+")" if i.count(j)>1 else j for i,j in re.findall(r"((\w+)(?:,\s*\2)*)",x)]
You can do this using re
.
Output:['word(2)', 'abc', 'stuff', 'word', 'stuff(2)']
Upvotes: 1