Novice User
Novice User

Reputation: 3844

Grep exact match not working

I've a properties file

myprop2=this is with <br/>

When I'm trying to grep for <br>

grep "^myprop.*=.`*<br>*`" MyProject.properties | xargs | cut -d '=' -f 1

Why is it finding myprop2 ?

Note : I'm finding <br> instead of <br/> ( without the ending tag)

Upvotes: 1

Views: 437

Answers (2)

mklement0
mklement0

Reputation: 440657

Your double-quoted string contains backticks around *<br>* (an instance of command substitution), which means that Bash will attempt to execute *<br>* as a command - which will fail for a variety of reasons - and expand the expression to the stdout output produced by that command.

Since that failed command produces no stdout output, the `...` expression expands to the null (empty) string, which means that grep will see the following string literal:
^myprop.*=."

Any line that starts with myprop, eventually followed by a = and at least 1 char. matches this regular expression, irrespective of what follows - which is why the myprop2 line matched.

If the backticks were meant to be matched as literals, you could either have escaped them as \` or used a single-quoted string instead.
(In case you think that * chars. must be escaped in quoted strings in order to be treated literally: they don't - only unquoted use requires escaping).

However, karakfa's helpful answer correctly implies that even if you didn't enclose *<br>* in backticks, following > with duplication symbol * means that any number of instances of > - including none - matches.

Since grep matches substrings of lines by default, this effectively matches any remainder of the line, including one starting with />, which therefore matches <br/> too.

Therefore, while following > with .* instead of * does solve that problem, it is not necessary - ending the regular expression with > will do.

His GNU grep solution (because only GNU grep supports the -P option to enable support for PCREs to enable such features as look-ahead assertions) can therefore be simplified to:

grep -oP 'myprop.*(?==.*<br>)' MyProject.properties

Note the use of single quotes, which is the better choice for strings that need not be interpolated, to guarantee their use as-is.

If using GNU grep is not an option, use (note that there's no reason to use xargs):

grep '^myprop.*=.*<br>' MyProject.properties | cut -d '=' -f 1

Alternatively, use awk:

awk -F= '$1 ~ /^myprop/ && $2 ~ /<br>/ { print $1 }' MyProject.properties

Or, if it's only about matching the value, irrespective of the property name:

awk -F= '$2 ~ /<br>/ { print $1 }' MyProject.properties

Upvotes: 1

karakfa
karakfa

Reputation: 67567

The * after the right angular bracket makes it 0 or more times so safely ignored in the match. I think what you meant is ...>.*

Also, using look-ahead you can eliminate some pipes

grep -oP "myprop.(?==.*<br>.*)" file

will give the same without xargs and cut

Upvotes: 2

Related Questions