Reputation: 313
Here's my question:
e.g
echo 123\< abc\\\ efg
The output should be
123< abc\ efg
My regex in lex file is
[^\n ]*[\\]+[^\n]
If I use this regex, my output is going to be
123< abc\ efg
which is wrong. Can anybody tell me how to match \(space) and regular (space) respectively?
Thanks!
Upvotes: 0
Views: 2792
Reputation: 241811
I believe that what you are looking for is a flex regular expression which will match a single shell token which does not contain quotes or other such complications.
Note that the characters which automatically terminate tokens are the following: ();<>&|
and whitespace. (The bash
manual says space
and tab
, but I'm pretty sure that newline
also separate words.)
Such a regular expression is possible, but (imho) it is of little use, partly because it doesn't take quoting (or bracketing: a$(echo foo)b
is a single word), and partly because the resulting word needs to be rescanned for escape characters. But whatever. Here's a sample flex regex:
([^();<>&|\\[:space:]]|\\(.|\n))+
That matches any number of consecutive instances of:
Upvotes: 1
Reputation: 531460
Your regex is correct. When you type at the prompt
echo 123\< abc\\\ efg
the following happens:
bash
replaces \<
with <
(without the backslash, bash
would treat <
as in input redirection operator.
bash
replaces \\
with a single literal \
bash
replaces '\ ` with a single literal space.
bash
calls the echo
command, passing it 2 arguments: 123<
and abc\ efg
.
echo
produces the output 123< abc\ efg
, a single string with a single space separating its two arguments.
Based on your regular expression, it looks like the string output in my step 5 above is what is stored in your file. From those 13 bytes, it would find 3 valid tokens: 123<
, abc\
, and efg
. If it prints them to standard output as a single string with a space separating each token, you would see 123< abc\ efg
. (There should be two spaces following that backslash; I can't seem to get multiple spaces to display.)
Upvotes: 0