Dmitry Leykin
Dmitry Leykin

Reputation: 505

capture repetition of letters in a word with regex

I'm trying to detect conditions where words have repetition of letters, and i would like to replace such matched conditions with the repeated letter. The text is in Hebrew. For instance, שללללוווווםםםם should just become שלום. Basically,when a letter repeats itself 3 times or more - it should be detected and replaced.

I want to use the regex expression for r gsub.

df$text <- gsub("?", "?", df$text)

Upvotes: 1

Views: 228

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

If you plan to only remove repeating characters from the Hebrew script (keeping others), I'd suggest:

s <- "שללללוווווםםםם .........         שללללוווווםםםם"
gsub("(\\p{Hebrew})\\1{2,}", "\\1", s, perl=TRUE)

See the regex demo in R

Details:

Upvotes: 2

rock321987
rock321987

Reputation: 11032

You can use

> x = "שללללוווווםםםם"
> gsub("(.)\\1{2,}", "\\1", x)
#[1] "שלום"

NOTE :- It will replace any character (not just hebrew) which is repeated more than three times.

or following for only letter/digit from any language

> gsub("(\\w)\\1{2,}", "\\1", x)

Upvotes: 4

Related Questions