Reputation: 505
I'm trying to detect conditions where words have repetition of letters, and i would like to replace such matched conditions with the repeated letter. The text is in Hebrew. For instance, שללללוווווםםםם
should just become שלום
.
Basically,when a letter repeats itself 3 times or more - it should be detected and replaced.
I want to use the regex expression for r gsub
.
df$text <- gsub("?", "?", df$text)
Upvotes: 1
Views: 228
Reputation: 626709
If you plan to only remove repeating characters from the Hebrew script (keeping others), I'd suggest:
s <- "שללללוווווםםםם ......... שללללוווווםםםם"
gsub("(\\p{Hebrew})\\1{2,}", "\\1", s, perl=TRUE)
See the regex demo in R
Details:
(\\p{Hebrew})
- Group 1 capturing a character from Hebrew script (as \p{Hebrew}
is a Unicode property/category class)\\1{2,}
- 2 or more (due to {2,}
limiting quantifier) same characters stored in Group 1 buffer (as \\1
is a backreference to Group 1 contents).Upvotes: 2
Reputation: 11032
You can use
> x = "שללללוווווםםםם"
> gsub("(.)\\1{2,}", "\\1", x)
#[1] "שלום"
NOTE :- It will replace any character (not just hebrew) which is repeated more than three times.
or following for only letter/digit from any language
> gsub("(\\w)\\1{2,}", "\\1", x)
Upvotes: 4