Keniajin
Keniajin

Reputation: 1659

invalid regular expression in gsub

Why is the email regex giving an error of invalid regular expression '^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$', reason 'Invalid character range'

blogs.smpl <- "mail:[email protected]: subject:Lorem Ipsum body:   is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"

blogs.smpl <- gsub("^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$","",blogs.smpl)

Upvotes: 1

Views: 3840

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

Because - should only be at the start or end of a character class. Otherwise, it means a ranges between the symbol before it, and after it.

Last character class is faulty: [a-zA-Z0-9-.]. It must be turned to [a-zA-Z0-9.-].

NOTE: In R, you cannot escape a hyphen inside a character class to match a literal hyphen, unless you use perl=TRUE.

Also, see the R String Manipulation PDF for more information on R character classes (Page 2) and regexes in general. Here is an excerpt:

Here is a set of rules on how to match characters as regular characters inside a character class: To match ] inside a character class put it first.

To match - inside a character class put it first or last.

To match ^ inside a character class put it anywhere, but first.

To match any other character or metacharacter (but \) inside a character class put it anywhere.

Upvotes: 6

Dave Sexton
Dave Sexton

Reputation: 11188

The reason is this section:

[a-zA-Z0-9-.]

Try putting the dash last like so:

[a-zA-Z0-9.-]

Upvotes: 1

Related Questions