Lamian
Lamian

Reputation: 313

shell and regex matching spaces

Here's my question:

e.g

echo 123\<  abc\\\ efg

The output should be

123< abc\ efg

My regex in lex file is

[^\n ]*[\\]+[^\n]

If I use this regex, my output is going to be

 123< abc\  efg

which is wrong. Can anybody tell me how to match \(space) and regular (space) respectively?

Thanks!

Upvotes: 0

Views: 2792

Answers (2)

rici
rici

Reputation: 241811

I believe that what you are looking for is a flex regular expression which will match a single shell token which does not contain quotes or other such complications.

Note that the characters which automatically terminate tokens are the following: ();<>&| and whitespace. (The bash manual says space and tab, but I'm pretty sure that newline also separate words.)

Such a regular expression is possible, but (imho) it is of little use, partly because it doesn't take quoting (or bracketing: a$(echo foo)b is a single word), and partly because the resulting word needs to be rescanned for escape characters. But whatever. Here's a sample flex regex:

([^();<>&|\\[:space:]]|\\(.|\n))+

That matches any number of consecutive instances of:

  • anything other than a metacharacter or an escape character, or
  • an escape character followed by any single character, or
  • an escape character followed by a newline.

Upvotes: 1

chepner
chepner

Reputation: 531460

Your regex is correct. When you type at the prompt

echo 123\<  abc\\\ efg

the following happens:

  1. bash replaces \< with < (without the backslash, bash would treat < as in input redirection operator.

  2. bash replaces \\ with a single literal \

  3. bash replaces '\ ` with a single literal space.

  4. bash calls the echo command, passing it 2 arguments: 123< and abc\ efg.

  5. echo produces the output 123< abc\ efg, a single string with a single space separating its two arguments.

Based on your regular expression, it looks like the string output in my step 5 above is what is stored in your file. From those 13 bytes, it would find 3 valid tokens: 123<, abc\, and efg. If it prints them to standard output as a single string with a space separating each token, you would see 123< abc\ efg. (There should be two spaces following that backslash; I can't seem to get multiple spaces to display.)

Upvotes: 0

Related Questions