user3771059
user3771059

Reputation: 31

GNU R: How to remove repeated characters at the beginning and end of every word of a string?

In GNU R, I need to remove repeated characters at the beginning and end of every word of a string.

In case I have the input

str <- "Tthis iss a splendiddd ddayyy"

The output should be

"This is a splendid day"

Does someone know how to do this? Thank you very much in advance !

With best wishes, Eric

Upvotes: 3

Views: 174

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269654

The first gsub removes duplicate leading characters and the second the trailing. The first regular expression matches a word boundary followed by any character followed by that same character possibly repeated. It then replaces the match with the character matched by the capture group, i.e. the part within parentheses. The upper or lower case is ignored. The second works similarly for trailing duplicates.

ss <- gsub("\\b(.)\\1+", "\\1", str, ignore.case = TRUE, perl = TRUE)
gsub("(.)\\1+\\b", "\\1", ss)
## [1] "This is a splendid day"

Upvotes: 1

Related Questions