yanachen
yanachen

Reputation: 3753

How to remove characters which repeat more than twice in a string?

For example, I want to remove the duplicate characters like hhhaaappy to hhaappy since h and a repeat twice. I want to remove all the characters which repeat more than twice. How to realize it in a fast way in python ?

Besides, is there any python module that can correct the word ? like correct hhhaaappy to happy ?

Upvotes: 2

Views: 1460

Answers (2)

Samuel L.
Samuel L.

Reputation: 74

I'd thought it'll be cool to share this. Module called autocorrect.

It works by using a Candidate Model, by performing "simple edit" to the word. For example, it processes "deletion->remove a letter", "transposition->swap two adjacent letters", "replacement->change one letter to another", "insertion->add a letter".

Therefore, hhhaaappy might not work but hhapy or hhapppy could work.

>>> from autocorrect import spell
>>> spell('hhhaaappy')
'hhhaaappy'
>>> spell('hhapy')
'shapy'
>>> spell('happpy')
'happy'
>>> spell('hhapppy')
'happy'

Upvotes: 3

Ajax1234
Ajax1234

Reputation: 71461

You can use itertools.groupby:

import itertools
s = "hhhaaappy"
new_s = [(a, list(b)) for a, b in itertools.groupby(s)]
final_s = ''.join(''.join(b[:-1]) if len(b) > 2 else ''.join(b) for a, b in new_s)

Output:

'hhaappy'

Upvotes: 6

Related Questions