Reputation: 143
I have the following String
str<-"Actividades Financieras && Bancarias #### 23"
I would like to remove multiple spacing, multiple && and multiple #. I'd like to get:
Actividades Financieras & Bancarias # 23
So far i have tried
gsub("[^A-z0-9]+"," ",string)
and i get
"Actividades Financieras Bancarias 23"
And it removes all # and &.
Is it there a way to keep one single character.
Thanks.
Upvotes: 3
Views: 1420
Reputation: 626870
You may use
string <- "Actividades Financieras && Bancarias #### 23"
gsub("([#&\\s])\\1+", "\\1", string, perl=TRUE)
# => [1] "Actividades Financieras & Bancarias # 23"
See the regex demo and an online R demo.
Details
([#&\\s])
- Capturing group 1 matching a #
, &
or whitespace\\1+
- a backreference to Group 1 value matching it 1 or more times (due to +
quantifier).The match is replaced with a single occurrence of the captured character (\1
placeholder references Group 1 value from the replacement pattern).
Upvotes: 8