Reputation: 3
I am playing bandit from overthewire.org; getting to level 10 requires me to find strings preceded with several "=" characters (equal sign) (I interpreted "several" as "two or more") in a text file.
The target lines look like this:
========== passwordhere123
i.e. ten equal signs, one space, and a string of letters and numbers, followed by line break (not sure which exact type).
These lines should be excluded:
c========== EqualSignDidNotStartLine
= only-one-equal-sign
equalsign=somewhereElse
No equal signs at all
The original data did not contain any lines preceded by less than ten but more than one ='s; there are some +'s (plus signs) littered in the text, but +'s and ='s are never in the same line.
The bandit server runs some kind of linux @ 4.18.12 (uname -r), GNU bash 4.4 (from man page), and GNU grep 2.27 (from man page).
The raw data contains non-readable parts, so it is fed through strings
first to leave only human-readable strings fro grep to process.
From what I learned, grep's default regex engine (BRE, thanks Casimir) should not be too different from PCRE's. *
is still a quantifier (match the preceding pattern zero times or more), not as a standalone pattern meaning "anything, zero times or more". This confuses me in grep's behavior below.
Edit: per this chart, "+" needs to be escaped (i.e.\+
) in BRE. It does not help though. I will make some more testing strings to try to decipher what's going on.
Here's the command I tried:
strings data.txt | grep -P -e ^==+.*
strings data.txt | grep -P -e ^==+.*$ #both PCRE expressions worked correctly
#start BRE
strings data.txt | grep -e ^==.* #includes every line preceded by at least two =; works
strings data.txt | grep -e ^==.*$ #includes every line preceded by at least two =; works
strings data.txt | grep -e ^==+.* #no output; why?
strings data.txt | grep -e ^==+.*$ #no output
strings data.txt | grep -e ^==+* #includes every target line, so works; WHY IS THIS A LEGAL REGEX?
strings data.txt | grep -e ^==+*$ #no output
strings data.txt | grep -e ^==\+.* #no output
strings data.txt | grep -e ^==\+.*$ #no output
strings data.txt | grep -e ^==\+* #includes every target line, so works
strings data.txt | grep -e ^==\+*$ #no output
Upvotes: 0
Views: 1094
Reputation: 1702
First, I'd be worried about shell expansion. From long experience, I put regexs on the command line in 'single quotes', to avoid meta-character madness.
Second, this (under BRE):
^==+*
is perfectly valid. It means:
^ anchored at the start of the input
== followed by 2 '=' charaters
+* followed by 0 or more '+' characters
You stated "From what I learned, grep's default regex engine (BRE, thanks Casimir) should not be too different from PCRE" and I think that's your problem. In particular, +
is a metacharacter in PRCE, but not in BRE. Observe:
echo '==+++++' | grep ^==+*
==+++++
echo '==+++++' | grep -E ^==+*
grep: repetition-operator operand invalid
The -E
on grep
enables extended regex.
So, now that you know that +
is just a literal +
under BRE, can you see why all of your patterns behave the way they do?
Upvotes: 1