showkey
showkey

Reputation: 358

the difference between `\\s|*` and `\\s|[*]` in regular expression in r?

What is the difference between \\s|* and \\s|[*] in regular expression in r?

> gsub('\\s|*','','Aug 2013*')
[1] "Aug2013*"
> gsub('\\s|[*]','','Aug 2013*')
[1] "Aug2013"

What is the function of [ ] here?

Upvotes: 6

Views: 3572

Answers (2)

hwnd
hwnd

Reputation: 70722

The first expression is invalid in the way you are using it, hence * is a special character. If you want to use sub or gsub this way with special characters, you can use fixed = TRUE parameter set.

This takes the string representing the pattern being search for as it is and ignores any special characters.

See Pattern Matching and Replacement in the R documentation.

x <- 'Aug 2013****'
gsub('*', '', x, fixed=TRUE)
#[1] "Aug 2013"

Your second expression is just using a character class [] for * to avoid escaping, the same as..

x <- 'Aug 2013*'
gsub('\\s|\\*', '', x)
#[1] "Aug2013"

As far as the explanation of your first expression: \\s|*

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR

And the second expression: \\s|[*]

\s      whitespace (\n, \r, \t, \f, and " ")
|       OR
[*]     any character of: '*'

Upvotes: 5

Paul Draper
Paul Draper

Reputation: 83235

The use of [] here is nothing else but to escape the * to a literal asterisk.

The first regex is invalid (* is special character meaning "zero or more").

The second regex is equivalent to

'\\s|\\*'

Upvotes: 3

Related Questions