Reputation: 31
In GNU R, I need to remove repeated characters at the beginning and end of every word of a string.
In case I have the input
str <- "Tthis iss a splendiddd ddayyy"
The output should be
"This is a splendid day"
Does someone know how to do this? Thank you very much in advance !
With best wishes, Eric
Upvotes: 3
Views: 174
Reputation: 269654
The first gsub
removes duplicate leading characters and the second the trailing. The first regular expression matches a word boundary followed by any character followed by that same character possibly repeated. It then replaces the match with the character matched by the capture group, i.e. the part within parentheses. The upper or lower case is ignored. The second works similarly for trailing duplicates.
ss <- gsub("\\b(.)\\1+", "\\1", str, ignore.case = TRUE, perl = TRUE)
gsub("(.)\\1+\\b", "\\1", ss)
## [1] "This is a splendid day"
Upvotes: 1