CamiloYateT
CamiloYateT

Reputation: 143

R - Replace Multiple Ocurrences of character by only one . REGEX

I have the following String

str<-"Actividades   Financieras && Bancarias #### 23"

I would like to remove multiple spacing, multiple && and multiple #. I'd like to get:

Actividades Financieras & Bancarias # 23

So far i have tried

gsub("[^A-z0-9]+"," ",string)

and i get

"Actividades Financieras Bancarias 23"

And it removes all # and &.

Is it there a way to keep one single character.

Thanks.

Upvotes: 3

Views: 1420

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You may use

string <- "Actividades   Financieras && Bancarias #### 23"
gsub("([#&\\s])\\1+", "\\1", string, perl=TRUE)
# => [1] "Actividades Financieras & Bancarias # 23"

See the regex demo and an online R demo.

Details

  • ([#&\\s]) - Capturing group 1 matching a #, & or whitespace
  • \\1+ - a backreference to Group 1 value matching it 1 or more times (due to + quantifier).

The match is replaced with a single occurrence of the captured character (\1 placeholder references Group 1 value from the replacement pattern).

Upvotes: 8

Related Questions