bmcentee148
bmcentee148

Reputation: 604

What don't I understand with regular expression on Linux?

Im using Linux mint and trying to pattern match with the grep command. I have read through some tutorials and they stated matching a telephone number of simply 3 digits followed by a hyphen followed by 4 digits i.e. 123-4567 matches the pattern

[0-9]{3}-[0-9]{4}

Okay so I understand what that regex is saying but the problem is it doesn't work at all. I have found the solution is actually

[0-9\]{3\}-[0-9\]{4\}

Now I am really confused. I thought that backslash (\) was an escape character and there is nothing there I want to escape. This second pattern works, and I have no idea why. The one I was taught in my class and find on many tutorials does not work at all. Please someone help me understand what the deal is here.

Upvotes: 1

Views: 173

Answers (2)

bmcentee148
bmcentee148

Reputation: 604

So it turns out that traditionally grep implements POSIX Basic regular expressions (BRE) and not Extended Regular Expressions (ERE). The difference is a matter of metacharacters. In BRE only ^ $ . [ ] * are meta, all others are considered literals. ERE adds metacharacters ( ) { } ? + | and their associated functions. Since grep without any additional options uses BRE, you actually have to add a backlash to ( ) { } for them be considered meta characters. This is completely backwards from ERE where appending the backlash causes in to be treated as a literal. Alternatively, you can run grep with the -E option for it to use Extended Regular expressions or the egrep command. To make this a little less wordy and clear...

grep [0-9\]{3\}-[0-9\]{4\}

produces the same result as

grep -E [0-9]{3}-[0-9]{4}

which produces the same result as

egrep [0-9]{3}-[0-9]{4}

Upvotes: 1

Allan
Allan

Reputation: 12448

Very briefly,

grep uses by default standard POSIX regex in which you need to escape a couple of characters like {, } , |, +, ?,(,) . Note that [, ] are not required to be escaped!!

As escaping all of those characters is a pain you can use extended regex with grep using grep -E '[0-9]{3}-[0-9]{4}' or perl regex using grep -P '[0-9]{3}-[0-9]{4}'. Extended regex allow the use of classes of [[:alnum:]] for alphanum, etc. Perl regex are way more powerful since they allow lookbehind and lookahead as well as many other defined keywords.

$ echo '123-4567' | grep '[0-9]{3}-[0-9]{4}'
>>> NO OUTPUT as the regex would match 1 digit followed literally  by {3}-  followed by literally  by 1 digit and {4}
$ echo '123-4567' | grep '[0-9]\{3\}-[0-9]\{4\}'
123-4567
$ echo '123-4567' | grep -P '[0-9]{3}-[0-9]{4}'
123-4567
$ echo '123-4567' | grep -E '[0-9]{3}-[0-9]{4}'
123-4567

READINGS:

Upvotes: 0

Related Questions