Kiran Vemuri
Kiran Vemuri

Reputation: 3032

Scope of grep with regular expressions

I am trying to use a regular expression with grep command of Linux

(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))

When I am trying it out at https://www.regextester.com with the contents of a file, I am getting the required result, i.e., the required fields are getting matched but when I am trying to use it as

grep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1

all it gives me is a null!

What's the problem here?

Upvotes: 7

Views: 956

Answers (3)

kdhp
kdhp

Reputation: 2116

grep(1) uses POSIX Basic Regular Expressions by default, and POSIX Extended Regular Expressions when used with the -E option.

In POSIX Regular Expressions non-special characters have undefined behaviour when escaped, ex. \s, and there is no syntax for non-greedy matching, ex. +?. Furthermore, in BREs, the + and | operators are not available, and parenthesis must be escaped to perform grouping.

The POSIX character classes [[:space:]] and [[:alnum:]_] are a portable alternatives to \s and \w respectively.

Excluding the next matching character from a repetition can be used to emulate non-greedy matching, ex. [^*]+?\w*: is equivalent to [^*[:alnum:]_:]+[[:alnum:]_]*:.

The given regular expression can be represented as multiple BREs:

grep -e '^[[:space:]]*\*[[:space:]]\{1,\}\[ \][^*[:alnum:]_+]\{1,\}[[:alnum:]_]*:[^*]\{1,\}[[:digit:]]$' \
    -e '[^*]\{1,\}\.com\.au$' file1

or an ERE:

grep -E '^[[:space:]]*\*[[:space:]]*\[ \][^*[:alnum:]_:]+[[:alnum:]_]*:[^*]+[[:digit:]]$|[^*]+\.com\.au$' \
    file1

Note that the GNU implementation of grep(1) allows for both short character classes (\s and \w) and non-greedy repetition (+?), as non-portable extensions.

Upvotes: 0

Kiran Vemuri
Kiran Vemuri

Reputation: 3032

pcregrep -M  '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))'

did the trick :)

Upvotes: 2

Tim Pote
Tim Pote

Reputation: 28049

I don't think grep understands character classes like \w and \s. Try using either grep -E or egrep. (grep -E is equivalent to egrep, egrep is just shorter to type.)

So your command would be:

egrep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1

Upvotes: 3

Related Questions