boje
boje

Reputation: 880

python regular expression to remove repeated words

I am very new a Python

I want to change sentence if there are repeated words.

Correct

Right now am I using this reg. but it do all so change on letters. Ex. "My friend and i is happy" --> "My friend and is happy" (it remove the "i" and space) ERROR

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

How can I do the same change but instead of letters it have to check on words?

Upvotes: 7

Views: 10202

Answers (3)

SATYAM TRIPATHI
SATYAM TRIPATHI

Reputation: 67

  • \b: Matches Word Boundaries

  • \w: Any word character

  • \1: Replaces the matches with the second word found

      import re
    
    
      def Remove_Duplicates(Test_string):
          Pattern = r"\b(\w+)(?:\W\1\b)+"
          return re.sub(Pattern, r"\1", Test_string, flags=re.IGNORECASE)
    
    
      Test_string1 = "Good bye bye world world"
      Test_string2 = "Ram went went to to his home"
      Test_string3 = "Hello hello world world"
      print(Remove_Duplicates(Test_string1))
      print(Remove_Duplicates(Test_string2))
      print(Remove_Duplicates(Test_string3))
    

Result:

    Good bye world
    Ram went to his home
    Hello world

Upvotes: 0

tom
tom

Reputation: 22939

text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

The \b matches the empty string, but only at the beginning or end of a word.

Upvotes: 9

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250881

Non- regex solution using itertools.groupby:

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'

Upvotes: 9

Related Questions