N1h1l1sT
N1h1l1sT

Reputation: 105

Matching a RegEx with all special characters

So, I've searched for a way to make a regex to match all special characters in a string, however, I haven't been able to achieve my noble enterprise.

I've tried to find all the characters that need to be backslashed, but sometimes it's one backslash, sometimes two, and others four. I've also came across the R package that has the correct regex way for each of them, so that you type "BACKSLASH" and it makes it the correct way, but I do not know the actual english names of all the characters I want to remove. I've also seen the str_replace_all(x, "[[:punct:]]", " ") but I'm not sure it works like I need it to.

I understand that it's a really basic (stupid even) question, especially for people who know regex, but I'd really appreciate it if you could reply with an answer.

To make a long story short I have a variable DirtyChars = c(',', '.', ';', '?', '/', '\', '`', '[', ']', '"', ':', '>', '<', '|', '-', '_', '=', '+', '(', ')', '^', '{', '}', '~', '\'', '*', '&', '%', '$', '!', '@', '#') and what I want is to make a regex that matches all of its elements.

Like the ones I tried implode(DirtyChars, sep = "|") or paste("[", implode(DirtyChars, sep = "|"), "]", sep = "") only one that actually works, because these don't.

Upvotes: 1

Views: 858

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626804

You may build a character class dynamically out of this character vector and use it to match those chars later:

DirtyChars = c(',', '.', ';', '?', '/', '\\', '`', '[', ']', '"', ':', '>', '<', '|', '-', '_', '=', '+', '(', ')', '^', '{', '}', '~', '\'', '*', '&', '%', '$', '!', '@', '#')
s <- "#w$o;r&d^$"
escape_for_char_class <- function(s) {gsub("([]^\\\\-])", "\\\\\\1", s)}
pattern <- paste0("[", escape_for_char_class(paste(DirtyChars, collapse="")), "]")
## [1] "[,.;?/\\\\`[\\]\":><|\\-_=+()\\^{}~'*&%$!@#]"
gsub(pattern, "", s, perl=TRUE)
## [1] "word"

See the R demo.

The escape_for_char_class function escapes the ^, \, - and ] chars that must be escaped inside a character class in a PCRE regex. Then, the pattern is built using [...] that form a character class that matches any char defined in the class. The last line gsub must be used with perl=TRUE as the pattern is PCRE compatible, TRE regex does not support escaped chars in the pattern.

Upvotes: 1

Related Questions