Reputation: 143
Recently I came across below situation while doing some home work with regular expressions.
s@ubuntu:~$ echo b | egrep []b]
b
s@ubuntu:~$ echo b | egrep [[b]
b
s@ubuntu:~$ echo b | egrep []b[]
b
s@ubuntu:~$ echo b | egrep [b[]
b
s@ubuntu:~$ echo b | egrep [[b]]
s@ubuntu:~$ echo b | egrep [b]]
s@ubuntu:~$ echo b | egrep [b\]]
s@ubuntu:~$ echo b | egrep [b\\]]
s@ubuntu:~$ echo b | egrep [\[b\]]
Why I'm not getting 'b' printed in last 5 cases?
Upvotes: 3
Views: 45
Reputation: 6272
The reason for this is in the special rules applied inside the bracket expressions:
The right square bracket
]
have to be placed immediately after the opening[
or[^
to be treated as a literal.
and
An escape char
\
is treated literally inside a chars class[...]
In addiction the shell apply the escape char \
prior to pass the expression to egrep
, because of the missing single '...'
or double quotes "..."
around the regex.
Jonathan Leffler explain it well with examples, i can only report a link to the Posix rules of expansions inside brackets to add an overview:
http://pubs.opengroup.org/onlinepubs/007904875/basedefs/xbd_chap09.html#tag_09_03_05
UPDATE
The same expressions with quotes:
# this matches 'b]' or '\]'
~$ echo b] | egrep '[b\]]'
b]
~$ echo '\]' | egrep '[b\]]' # note the quotes prior and after the pipe
\]
# the next one is equivalent to '[b\]]'
# cause a double \ inside chars class is redundant
~$ echo b] | egrep '[b\\]]'
b]
~$ echo '\]' | egrep '[b\\]]'
\]
# the last one matches '\]' or '[]' or 'b]'
~$ echo b] | egrep '[\[b\]]'
[b]
~$ echo [] | egrep '[\[b\]]'
[]
~$ echo '\]' | egrep '[\[b\]]'
\]
# without quotes in the echo section, the escape \ is applied by the shell
# so egrep receive only a closing bracket ']' and nothing is printed out
~$ echo \] | egrep '[\[b\]]'
# If we remove instead the quotes from the egrep section
# the regex becomes equivalent to [[b]] so it now matches '[]' or 'b]' and not '\]' anymore
~$ echo '\]' | egrep [\[b\]]
~$ echo '[]' | egrep [\[b\]]
[]
~$ echo 'b]' | egrep [\[b\]]
b]
Upvotes: 3
Reputation: 754480
egrep [[b]]
— Looks for a b
or [
followed by a ]
; not found.egrep [b]]
— Looks for a b
followed by a ]
; not found.egrep [b\]]
— Looks for a b
followed by a ]
; not found. The backslash is omitted by the shell and not seen by egrep
.egrep [b\\]]
— Looks for a b
or a backslash followed by ]
; not found.egrep [\[b\]]
— Looks for a b
or a [
followed by ]
; not found. The backslashes are omitted by the shell and not seen by egrep
.Inside a character class (started by [
), the first ]
terminates the class unless the ]
is the first character after the [
, or the first character after the [^
for a negated character class. Note that ]
is not a regex metacharacter unless there is a preceding [
making it into the end of a character class. You also find that $
is not a metacharacter in the middle of a string, nor ^
unless it appears at the start, nor *
nor +
nor ?
if they appear first, etc. See POSIX Regular Expressions for a detailed discussion — the regular expressions handled by egrep
(now grep -E
) are 'extended regular expressions'.
The shell messes around with backslashes before egrep
gets a chance to see them. You should enclose your regex in single quotes to avoid the shell altering what egrep
sees.
You can demonstrate my analysis by changing what is echoed:
echo '[b]' | egrep [[b]]
echo '[b]' | egrep [b]]
echo '[b]' | egrep [b\]]
echo '[b]' | egrep [b\\]]
echo '[b]' | egrep [\[b\]]
The output from that is:
[b]
[b]
[b]
[b]
[b]
The [
in these examples (in the echoed data) is present for cosmetic reasons; it could be omitted and the lines would be accepted.
Upvotes: 5