Bz16
Bz16

Reputation: 3

Unexpected behavior of grep in bash regarding lines preceded with several same characters

I am playing bandit from overthewire.org; getting to level 10 requires me to find strings preceded with several "=" characters (equal sign) (I interpreted "several" as "two or more") in a text file.

The target lines look like this:

========== passwordhere123

i.e. ten equal signs, one space, and a string of letters and numbers, followed by line break (not sure which exact type).

These lines should be excluded:

c========== EqualSignDidNotStartLine

= only-one-equal-sign

equalsign=somewhereElse

No equal signs at all

The original data did not contain any lines preceded by less than ten but more than one ='s; there are some +'s (plus signs) littered in the text, but +'s and ='s are never in the same line.

The bandit server runs some kind of linux @ 4.18.12 (uname -r), GNU bash 4.4 (from man page), and GNU grep 2.27 (from man page).

The raw data contains non-readable parts, so it is fed through strings first to leave only human-readable strings fro grep to process.

From what I learned, grep's default regex engine (BRE, thanks Casimir) should not be too different from PCRE's. * is still a quantifier (match the preceding pattern zero times or more), not as a standalone pattern meaning "anything, zero times or more". This confuses me in grep's behavior below.

Edit: per this chart, "+" needs to be escaped (i.e.\+) in BRE. It does not help though. I will make some more testing strings to try to decipher what's going on.

Here's the command I tried:

strings data.txt | grep -P -e ^==+.*

strings data.txt | grep -P -e ^==+.*$ #both PCRE expressions worked correctly

#start BRE

strings data.txt | grep -e ^==.*    #includes every line preceded by at least two =; works

strings data.txt | grep -e ^==.*$   #includes every line preceded by at least two =; works

strings data.txt | grep -e ^==+.*   #no output; why?

strings data.txt | grep -e ^==+.*$  #no output

strings data.txt | grep -e ^==+*    #includes every target line, so works; WHY IS THIS A LEGAL REGEX?

strings data.txt | grep -e ^==+*$   #no output

strings data.txt | grep -e ^==\+.*  #no output

strings data.txt | grep -e ^==\+.*$ #no output

strings data.txt | grep -e ^==\+*   #includes every target line, so works

strings data.txt | grep -e ^==\+*$  #no output

Upvotes: 0

Views: 1094

Answers (1)

landru27
landru27

Reputation: 1702

First, I'd be worried about shell expansion. From long experience, I put regexs on the command line in 'single quotes', to avoid meta-character madness.

Second, this (under BRE):

^==+*

is perfectly valid. It means:

^     anchored at the start of the input
==    followed by 2 '=' charaters
+*    followed by 0 or more '+' characters 

You stated "From what I learned, grep's default regex engine (BRE, thanks Casimir) should not be too different from PCRE" and I think that's your problem. In particular, + is a metacharacter in PRCE, but not in BRE. Observe:

echo '==+++++' | grep ^==+*
==+++++

echo '==+++++' | grep -E ^==+*
grep: repetition-operator operand invalid

The -E on grep enables extended regex.

So, now that you know that + is just a literal + under BRE, can you see why all of your patterns behave the way they do?

Upvotes: 1

Related Questions