Reputation: 358
What is the difference between \\s|*
and \\s|[*]
in regular expression in r?
> gsub('\\s|*','','Aug 2013*')
[1] "Aug2013*"
> gsub('\\s|[*]','','Aug 2013*')
[1] "Aug2013"
What is the function of [ ]
here?
Upvotes: 6
Views: 3572
Reputation: 70722
The first expression is invalid in the way you are using it, hence *
is a special character. If you want to use sub
or gsub
this way with special characters, you can use fixed = TRUE
parameter set.
This takes the string representing the pattern being search for as it is and ignores any special characters.
See Pattern Matching and Replacement
in the R
documentation.
x <- 'Aug 2013****'
gsub('*', '', x, fixed=TRUE)
#[1] "Aug 2013"
Your second expression is just using a character class []
for *
to avoid escaping, the same as..
x <- 'Aug 2013*'
gsub('\\s|\\*', '', x)
#[1] "Aug2013"
As far as the explanation of your first expression: \\s|*
\s whitespace (\n, \r, \t, \f, and " ")
| OR
And the second expression: \\s|[*]
\s whitespace (\n, \r, \t, \f, and " ")
| OR
[*] any character of: '*'
Upvotes: 5
Reputation: 83235
The use of []
here is nothing else but to escape the *
to a literal asterisk.
The first regex is invalid (*
is special character meaning "zero or more").
The second regex is equivalent to
'\\s|\\*'
Upvotes: 3