daniel451
daniel451

Reputation: 11002

Grep hashes via regex in bash

I want to grep for hexadecimal hashes in strings and only extract those hashes.

I've tested a regex in online regex testing tools that does the trick:

\b[0-9a-f][0-9a-f]+[0-9a-f]\b

The \b is used to set word boundaries (start & end) that should be any character 0-9 or a-f. Since I do not know if the hashes are 128bit or higher, I do not know the length of the hashes in advance. Therefore I set [0-9a-f]+ in the middle in order match any number of [0-9a-f], but at least one (since no hash consists just of two characters that are checked with the boundaries \b).

However, I noticed that

grep --only-matching -e "\b[0-9a-f][0-9a-f]+[0-9a-f]\b"

does not work in the shell, while the regex \b[0-9a-f][0-9a-f]*[0-9a-f]\b works in online regex testing tools.

In fact, the shell version does only work if I escape the quantifier + with a backslash:

grep --only-matching -e "\b[0-9a-f][0-9a-f]\+[0-9a-f]\b"
                                           ^
                                           |_ escaped +

Why does grep needs this escaping in the shell?

Is there any downside of my rather simple approach?

Upvotes: 3

Views: 2259

Answers (3)

Ruslan Osmanov
Ruslan Osmanov

Reputation: 21492

Grep runs basic regular expressions by default. You need to escape the + quantifier with a backslash as it is said in the documentation:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Also, there is no need for -e option, just

grep -o '\b[0-9a-f]\+\b' file

Upvotes: 1

SLePort
SLePort

Reputation: 15461

The + quantifier is not part of the POSIX Basic Regular Expressions (aka BRE) so you must escape it with grep in BRE mode.

As an alternative, you can:

  • add the -E flag to grep:
    grep -E --only-matching -e "\b[0-9a-f][0-9a-f]+[0-9a-f]\b"
  • use [0-9a-f][0-9a-f]* or [0-9a-f]{1,}

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

I don't know why a metacharacter would need to be escaped in the bash, but your regex could be rewritten as this:

grep --only-matching -e "\b[0-9a-f]{3,}\b"

Upvotes: 3

Related Questions