Reputation: 1659
Why is the email regex
giving an error
of invalid regular expression '^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$', reason 'Invalid character range'
blogs.smpl <- "mail:[email protected]: subject:Lorem Ipsum body: is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
blogs.smpl <- gsub("^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$","",blogs.smpl)
Upvotes: 1
Views: 3840
Reputation: 627536
Because -
should only be at the start or end of a character class. Otherwise, it means a ranges between the symbol before it, and after it.
Last character class is faulty: [a-zA-Z0-9-.]
. It must be turned to [a-zA-Z0-9.-]
.
NOTE: In R, you cannot escape a hyphen inside a character class to match a literal hyphen, unless you use perl=TRUE
.
Also, see the R String Manipulation PDF for more information on R character classes (Page 2) and regexes in general. Here is an excerpt:
Here is a set of rules on how to match characters as regular characters inside a character class: To match
]
inside a character class put it first.To match
-
inside a character class put it first or last.To match
^
inside a character class put it anywhere, but first.To match any other character or metacharacter (but
\
) inside a character class put it anywhere.
Upvotes: 6
Reputation: 11188
The reason is this section:
[a-zA-Z0-9-.]
Try putting the dash last like so:
[a-zA-Z0-9.-]
Upvotes: 1