DimG
DimG

Reputation: 1781

Escaping slashes and tabs

Trying to split a line with two-bytes \t delimiter, faced with strange behaviour

a) $> echo '1\2' | awk -F'\' '{print NF}'
2

b) $> echo '1\2' | awk -F'\\' '{print NF}'
2

c) $> echo '1\2' | awk -F'\\\\' '{print NF}'
2

d) $> echo '1\\2' | awk -F'\\\\\\\\' '{print NF}'
2 # splits by `\\` (two bytes) only with 8 slashes

e) $> echo '1\t2' | awk -F'\\\\t' '{print NF}'
2 # splits by `\t` (two bytes) only with 4 slashes 

It seems that there are two reductions taking place:

  1. \\ -> \ : awk gets half a number of slashes as a -F param value. If true, what's the rule? Note that I'm not using C-escaped string ($'' notation)
  2. awk itself makes \\ -> \ as is described in the docs

What's happening here?

Upvotes: 0

Views: 49

Answers (1)

Barmar
Barmar

Reputation: 782148

Backslash has no special meaning to the shell when it's inside single-quoted strings; the shell only treats it as an escape character in double-quoted or unquoted strings. So all the reductions are being done by awk, not bash.

The -F option is essentially equivalent to the -v option with the variable FS, so

awk -\\'

is like

awk -v FS='\\'

According to the AWK Manual:

awk processes the values of command line assignments for escape sequences (see Section 8.1 [Constant Expressions], page 57).

That means it's like having the following assignment in the BEGIN block:

awk 'BEGIN {FS="\\"} ...'

The first backslash escapes the second one, so it assigns a single backslash to the variable.

However, it's not precisely equivalent to substituting that assignment. In the case of an odd number of backslashes, if you were to write something like

awk 'BEGIN {FS="\\\"} ...'

the third backslash escapes the following double-quote, preventing it from ending the string (resulting in an "unterminated string" error). Since there are no actual double quotes when the variable is being assigned using -v, there's no quotes that would be escaped. The extra backslash in -v is simply treated literally, since there's nothing after it for it to escape.

A further issue is that the value of FS is treated as a regular expression. Regexp also uses backslash as an escape character, so another level of reduction takes place.

Upvotes: 1

Related Questions