screechOwl
screechOwl

Reputation: 28169

R remove special character and repeating underscores

I have a dataset that contains spaces and other punctuation characters. I'm trying to replace the spaces and special characters with "_". This creates spots with multiple "_" strung together, so I'd like to remove these too by using the following function as described here :

removeSpace <- function(x){
    class1 <- class(x)
    x <- as.character(x)
    x <- gsub(" |&|-|/|'|(|)",'_', x) # convert special characters to _
    x <- gsub("([_])\\1+","\\1", x)   # convert multiple _ to single _

    if(class1 == 'character'){
        return(x)
    }
    if(class1 == 'factor'){
        return(as.factor(x))
    }
}

The issue is instead of removing spaces and replacing with "_" it does every other character with "_" (i.e. "test" -> "t_e_s_t")

What am I doing wrong?

Upvotes: 3

Views: 2784

Answers (1)

CAustin
CAustin

Reputation: 4614

You don't need to run two separate replacements to accomplish this. Just put a + quantifier in your match pattern.

Match: [-/&'() ]+

Replace with: _

Also note that I used a character set instead of switching between each option with |. This is generally a better approach when matching one of multiple individual characters.

Upvotes: 10

Related Questions