capture repetition of letters in a word with regex

Question

I'm trying to detect conditions where words have repetition of letters, and i would like to replace such matched conditions with the repeated letter. The text is in Hebrew. For instance, שללללוווווםםםם should just become שלום. Basically,when a letter repeats itself 3 times or more - it should be detected and replaced.

I want to use the regex expression for r gsub.

df$text <- gsub("?", "?", df$text)

rock321987 · Accepted Answer

You can use

> x = "שללללוווווםםםם"
> gsub("(.)\1{2,}", "\1", x)
#[1] "שלום"

NOTE :- It will replace any character (not just hebrew) which is repeated more than three times.

or following for only letter/digit from any language

> gsub("(\w)\1{2,}", "\1", x)

capture repetition of letters in a word with regex

Answers (2)

Related Questions