Reputation: 11002
I want to grep for hexadecimal hashes in strings and only extract those hashes.
I've tested a regex in online regex testing tools that does the trick:
\b[0-9a-f][0-9a-f]+[0-9a-f]\b
The \b
is used to set word boundaries (start & end) that should be any character 0-9
or a-f
. Since I do not know if the hashes are 128bit or higher, I do not know the length of the hashes in advance. Therefore I set [0-9a-f]+
in the middle in order match any number of [0-9a-f]
, but at least one (since no hash consists just of two characters that are checked with the boundaries \b
).
However, I noticed that
grep --only-matching -e "\b[0-9a-f][0-9a-f]+[0-9a-f]\b"
does not work in the shell, while the regex \b[0-9a-f][0-9a-f]*[0-9a-f]\b
works in online regex testing tools.
In fact, the shell version does only work if I escape the quantifier +
with a backslash:
grep --only-matching -e "\b[0-9a-f][0-9a-f]\+[0-9a-f]\b"
^
|_ escaped +
Why does grep
needs this escaping in the shell?
Is there any downside of my rather simple approach?
Upvotes: 3
Views: 2259
Reputation: 21492
Grep runs basic regular expressions by default. You need to escape the +
quantifier with a backslash as it is said in the documentation:
In basic regular expressions the meta-characters
?
,+
,{
,|
,(
, and)
lose their special meaning; instead use the backslashed versions\?
,\+
,\{
,\|
,\(
, and\)
.
Also, there is no need for -e
option, just
grep -o '\b[0-9a-f]\+\b' file
Upvotes: 1
Reputation: 15461
The +
quantifier is not part of the POSIX Basic Regular Expressions (aka BRE) so you must escape it with grep
in BRE mode.
As an alternative, you can:
-E
flag to grep
:grep -E --only-matching -e "\b[0-9a-f][0-9a-f]+[0-9a-f]\b"
[0-9a-f][0-9a-f]*
or [0-9a-f]{1,}
Upvotes: 2
Reputation: 521239
I don't know why a metacharacter would need to be escaped in the bash, but your regex could be rewritten as this:
grep --only-matching -e "\b[0-9a-f]{3,}\b"
Upvotes: 3