Sergej Andrejev
Sergej Andrejev

Reputation: 9423

Awk field separator behaviour

Why this awk script:

awk '{FS = "\t" ; print $1 " - " $2}' A.txt

with this input file A.txt

B A A1
C B A2
D A A3

outputs these results

B - A
C B - A2
D A - A3

Note that between first B and A there is a space and not a tab character. I double checked this

Upvotes: 5

Views: 7750

Answers (4)

dubiousjim
dubiousjim

Reputation: 4822

First off, you are changing the variable FS each line; you probably only intend to change it once. Also, if you did want to change FS, you probably want to change it before any lines are parsed. POSIX requires that any changes to FS only affect the parsing of the next line. (Many implementations don't yet conform to that requirement, and might use the changed value of FS for the current line if the current line hasn't yet been parsed.) To solve both of these issues, you should change FS like this:

awk 'BEGIN { FS="\t" } {...}' A.txt

or this:

awk -v 'FS=\t' '{...}' A.txt

(There's also a form using -F '\t' instead of -v 'FS=\t', but some implementations of awk won't honor the C-escape \t in the former construction.)

But note that FS governs the parsing of input data, whereas OFS governs the parsing of output data. It's not clear from your question which you want to be doing. At first glace, your input data doesn't look like it has any tabs in it, so you probably want to leave FS to its default value of " ".

If you want to change the output formatting, you could set OFS to "\t", in either of the ways we just described for FS. It's not clear that that's what you want, either, though, since you're not making any use of OFS in your test script. When you say:

print $1 " - " $2

you're printing a single argument, which is the concatenation of $1 and " - " and $2. To make use of OFS, you'd have to print several arguments, which would be separated with a comma---for example like this:

print $1, $2

Puzzled, I look again at the sample data and output you present. Maybe your sample data really has the format: B<space>A<tab>A1, and maybe you do intend to be setting FS so as to be grabbing the B<space>A in $1, and the A1 in $2. If that's right, then just be sure to set FS at the right time, before any line-processing begins. Then your script should work no matter what awk implementation you use.

Upvotes: 2

user332325
user332325

Reputation:

I believe it's because FS is being set in the first action. Before the first action is invoked, the splitting of the first line is done already, and it uses the default FS (whitespace).

So to get it consistent, you should invoke awk with -F option.

Upvotes: 7

Dr. belisarius
Dr. belisarius

Reputation: 61066

The correct way is:

BEGIN {FS = "\t"}
{ print $1 " - " $2}  

You are setting the FS too late (after the first line is splitted)

Upvotes: 7

Foo Bah
Foo Bah

Reputation: 26281

if you dont put a space between, awk just concatenates the string.

change the command to

print $1, " - ", $2

also you probably want to set OFS for output

Upvotes: 0

Related Questions