Reputation: 604
Im using Linux mint and trying to pattern match with the grep
command. I have read through some tutorials and they stated matching a telephone number of simply 3 digits followed by a hyphen followed by 4 digits i.e. 123-4567 matches the pattern
[0-9]{3}-[0-9]{4}
Okay so I understand what that regex is saying but the problem is it doesn't work at all. I have found the solution is actually
[0-9\]{3\}-[0-9\]{4\}
Now I am really confused. I thought that backslash (\) was an escape character and there is nothing there I want to escape. This second pattern works, and I have no idea why. The one I was taught in my class and find on many tutorials does not work at all. Please someone help me understand what the deal is here.
Upvotes: 1
Views: 173
Reputation: 604
So it turns out that traditionally grep implements POSIX Basic regular expressions (BRE) and not Extended Regular Expressions (ERE). The difference is a matter of metacharacters. In BRE only ^ $ . [ ] *
are meta, all others are considered literals. ERE adds metacharacters ( ) { } ? + |
and their associated functions. Since grep without any additional options uses BRE, you actually have to add a backlash to ( ) { }
for them be considered meta characters. This is completely backwards from ERE where appending the backlash causes in to be treated as a literal. Alternatively, you can run grep with the -E
option for it to use Extended Regular expressions or the egrep
command. To make this a little less wordy and clear...
grep [0-9\]{3\}-[0-9\]{4\}
produces the same result as
grep -E [0-9]{3}-[0-9]{4}
which produces the same result as
egrep [0-9]{3}-[0-9]{4}
Upvotes: 1
Reputation: 12448
Very briefly,
grep
uses by default standard POSIX regex in which you need to escape a couple of characters like {
, }
, |
, +
, ?
,(
,)
. Note that [
, ]
are not required to be escaped!!
As escaping all of those characters is a pain you can use extended regex with grep using grep -E '[0-9]{3}-[0-9]{4}'
or perl regex using grep -P '[0-9]{3}-[0-9]{4}'
. Extended regex allow the use of classes of [[:alnum:]]
for alphanum, etc. Perl regex are way more powerful since they allow lookbehind and lookahead as well as many other defined keywords.
$ echo '123-4567' | grep '[0-9]{3}-[0-9]{4}'
>>> NO OUTPUT as the regex would match 1 digit followed literally by {3}- followed by literally by 1 digit and {4}
$ echo '123-4567' | grep '[0-9]\{3\}-[0-9]\{4\}'
123-4567
$ echo '123-4567' | grep -P '[0-9]{3}-[0-9]{4}'
123-4567
$ echo '123-4567' | grep -E '[0-9]{3}-[0-9]{4}'
123-4567
READINGS:
Upvotes: 0