Reputation: 3256
I'm a little confused about how many backslashes are needed to escape the alternation operator |
in regular expressions for grep. This
echo abcdef | grep -e"def|zzz"
outputs nothing, because grep is not in extended regex mode. Escaping with one backslash works,
echo abcdef | grep -e"def\|zzz"
prints abcdef
. More surprisingly, escaping with 2 backslashes also works,
echo abcdef | grep -e"def\\|zzz"
prints abcdef
. Escaping with three backslashes fails,
echo abcdef | grep -e"def\\\|zzz"
prints nothing.
Does anyone have an explanation, especially for the 2-backslash case ?
Edit:
Using this simple argument-printing program,
void main(int argc, char** argv)
{
for (int i = 0; i < argc; i++)
printf("Arg %d: %s\n", i, argv[i]);
}
I investigated what my shell does with the command lines above :
-e"def|zzz"
becomes -edef|zzz
-e"def\|zzz"
becomes -edef\|zzz
-e"def\\|zzz"
becomes -edef\\|zzz
-e"def\\\|zzz"
becomes -edef\\\|zzz
So all double-quotes are removed and the backslashes and pipes are not altered by the shell. I suspect grep itself does something special with the literal string \\|
.
Upvotes: 13
Views: 70776
Reputation: 226296
The lowercase -e
option is used to express multiple search operations. The alternation is implied:
$ echo abcdef | grep -e 'def' -e'zzz'
abcdef
$ echo abczzz | grep -e 'def' -e'zzz'
abczzz
Alternatively, you can use the upper -E
option for extended regular expression notation:
$ echo abcdef | grep -E 'def|zzz'
abcdef
I believe this solves you problem directly (either using -e
for alternation or -E
for extended regex notation). Hope this helps :-)
FWIW, the issue with the backslashes is that |
has special meaning to bash and needs to be escaped unless it is in single quotes. Here is a resource on quoting and escaping rules and the common pitfalls: https://web.archive.org/web/20230323230844/https://wiki.bash-hackers.org/syntax/quoting
Upvotes: 8
Reputation: 6335
According to grep man pages, and especially according to info pages, all examples given for grep include single quotes and not double quotes.
Making some similar tests with single quotes we have a different and correct behavior:
$ cat file1
def
def\
def\\
def\\\
def\|
aaa
nnn
$ cat -n file1 |grep -e 'def|zzz' #No results
$ cat -n file1 |grep -e 'def\|zzz'
1 def
2 def\
3 def\\
4 def\\\
5 def\|
$ cat -n file1 |grep -e 'def\\|zzz' #No results
$ cat -n file1 |grep -e 'def\\\|zzz'
2 def\
3 def\\
4 def\\\
5 def\|
$ cat -n file1 |grep -e 'def\\\\|zzz' #No results
$ cat -n file1 |grep -e 'def\\\\\|zzz'
3 def\\
4 def\\\
Conclusion : For regex in grep use single quotes.
But to be honest, i don't know why the behavior is completelly different when using double quotes. Should have something to do with bash expansion.
Update
See this bash function test results that prove different interpretation of single vs double quotes in args:
function tt { printf "%s: %s\n" "$1" "$2"; }
tt -e 'def\\|aaa' #Parsed correctly
tt -e 'def\\\|aaa' #We send three slashes - function gets three slashes
tt -e 'def\\\\|aaa' #We send four slashes - function gets four slashes
tt -e "def\\|aaa" #We send two slashes but function displays ONE
tt -e "def\\\|aaa" #We send three slashes but function displays TWO
tt -e "def\\\\|aaa" #We send four slashes but function displays TWO
#Output
-e: def\\|aaa
-e: def\\\|aaa
-e: def\\\\|aaa
-e: def\|aaa
-e: def\\|aaa
-e: def\\|aaa
Mind the case of three and four slashes inside double quotes.
One step more:
tt -e 'def\|aaa' #Displays def\|aaa (correct parsing)
tt -e 'def\\|aaa' #Displays def\\|aaa (correct parsing)
tt -e "def\|aaa" #Displays def\|aaa (correct parsing)
tt -e "def\\|aaa" #Displays def\|aaa (same as before - not correct parsing)
Probably the last two lines above in double quotes explain why the results in your test (\|
vs \\|
) have the same regex operation when enclosed in double quotes.
Upvotes: 0
Reputation: 52122
If you double quote your regex, the shell treats backslashes specially (emphasis mine):
The backslash retains its special meaning only when followed by one of the following characters:
$
,`
,"
,\
, ornewline
. Within double quotes, backslashes that are followed by one of these characters are removed.
This means that your expressions are treated as follows:
grep -e"def|zzz"
– grep receives def|zzz
; because it defaults to basic regular expressions (BRE), |
isn't special1 and grep tries to match the literal string def|zzz
.grep -e"def\|zzz"
– |
isn't one of the special characters mentioned above, so grep receives def\|zzz
, and GNU grep treats \|
as alternation1.grep -e"def\\|zzz"
– \\
is special according to the manual excerpt (try echo "\\"
); grep sees def\|zzz
because the shell removes a backslash, and the behaviour is the same as for the second case.grep -e"def\\\|zzz"
– the shell turns this into def\\|zzz
(\\
becomes \
, \|
isn't special to the shell and stays unchanged); grep sees \\
as a literal backslash (backslash escaped by backslash), so |
isn't special, and grep tries to match the exact string def\|zzz
.In general, it's prudent to single quote your regular expression so the shell leaves it alone.
As a side note, I don't think your C program is representative of how the shell processes arguments; in Shell Operation, quoting is a separate step and includes backslash processing (see Escape Character).
1As an extension, GNU grep allows you to escape |
in BRE and get alternation. POSIX BRE don't have alternation. As a consequence, the only difference between grep
and grep -E
for GNU grep is what has to be escaped; functionality is identical.
Upvotes: 5
Reputation: 425003
The first fails because grep escapes the pipe programmatically, resulting in a literal pipe in the regex.
The last attempts fails because \\\|
results in a literal backslash then a literal pipe in the regex.
echo 'def|zzz' | grep -e "def|zzz" --> def|zzz
echo 'def\\|zzz' | grep -e "def\\\|zzz" --> def\|zzz
Upvotes: 1