Reputation: 31171
I'm trying to understand the []
syntax with extended regular expressions in grep.
The following two patterns are equivalent:
$ echo "foo_bar" | grep -E "[a-z_]+$"
foo_bar
$ echo "foo_bar" | grep -E "[_a-z]+$"
foo_bar
However, these two are not:
$ echo "foobar[]" | grep -E "[a-z_\[\]]+$"
foobar[]
$ echo "foobar[]" | grep -E "[a-z\[\]_]+$"
Why is this? Is this documented anywhere? I couldn't see anything in man grep
about this.
Upvotes: 3
Views: 126
Reputation: 20843
You should be careful when using double quotes "
and backslashes \
since BASH handles the backslashes first. This changes your regular expression to [a-z_[]]+$
. However there still is a fine point and for the remainder of this question I assume that you had used single quotes.
In the first case you have the character group [a-z_\[\]
, which matches characters a-z
, _
, \
, [
. The final \]
does not list ]
as another character of the character group but rather is another \
and a the closing bracket of the character class. Notice how:
$ echo "foobar[]" | grep -E '[a-z\[\]+\]+$'
foobar[]
$ echo '\' | grep -E '[\]$'
\
If you want to add ]
you have to list it first, that is []]
matches a single ]
.
$ echo "]" | grep -E '[]]$'
]
For a reference see man grep
:
To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.
as well as https://www.regular-expressions.info/charclass.html
In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket ], the backslash \, the caret ^, and the hyphen -. The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
Even more test cases to examine [\s]
(which is the same as [s\]
and different from [[:space:]]
):
$ echo 'a ' | grep -E 'a[\s]$'
$ echo 's' | grep -E '[\s]$'
s
$ echo '\' | grep -E '[\s]$'
\
$ echo 'a ' | grep -E 'a[[:space:]]$'
a
So the takeaway is: Order does not matter when listing characters of a character class, except when it does.
Upvotes: 2