fedorqui
fedorqui

Reputation: 290105

Why is `awk 1 RS=, <<< "1,2"` writing an extra new line?

I answered a question in which I used a little awk trick to convert commas into new lines:

awk 1 RS=, file

However, I then noticed that this is introducing an extra new line to the end of the output:

$ cat a
1,2
$ awk 1 RS=, a
1
2
             # one extra line
$ awk 1 RS=, <<< "1,2"
1
2
             # one extra line

Since 1 is short-hand for {print $0} I decided to see what is going on:

$ awk '{print $0, "hey"}' RS=, <<< "1,2"
1 hey
2
 hey

So yes, apparently the splitting is done, but for some reason the second record consists in 2 followed by a new line. And yes, awk just sees two records:

$ awk '{print NR}' RS=, <<< "1,2"
1
2

To me this made sense, since echo and here-strings add such new line to the end of their output, while printf does not. And effectively it works fine with printf:

$ awk '{print $0, "hey"}' RS=, < <(printf "1,2")
1 hey
2 hey         # no more lines after this

OK, I said: then it is just a problem on the new line that gets appended to the end of the string.

But then... I see that it is not always the case and my confusion gets bigger:

$ awk '{print $0, "hey"}' <<< "1,2"
1,2 hey         # no more lines after this

So my question is: what is RS=, doing to cause this extra new line to be appended?

Upvotes: 2

Views: 431

Answers (3)

Ed Morton
Ed Morton

Reputation: 204164

It's not awk that's adding a newline, it's <<<. If the shell didn't add a terminating newline to the end of the text you specify using <<< then the result would not be a text "file" per POSIX and so would rely on undefined behavior from any tool trying to parse it.

So when you write command <<< 'foo' what command sees is not foo, it's foo\n and therefore in your command line:

awk 1 RS=, <<< "1,2"

what awk sees is 1,2\n and when you split that into records at the , you get the first record of 1 and the 2nd record of 2\n.

Upvotes: 2

Tom Fenech
Tom Fenech

Reputation: 74685

Awk processes each record, automatically removing the record separator from the end. If you've changed it to something other than a newline, this means that it won't be removed, so you end up with this behaviour.

Your "record count" is 2 even though you only have one , but it's also 2 in this example (which hopefully doesn't make this even more confusing!):

$ printf 'a\nb' | awk '{print NR}'
1
2

Upvotes: 2

karakfa
karakfa

Reputation: 67507

it's the newline in the input stream.

$ awk 1 RS=, < <(echo -n 1,2)

1
2

won't have the extra newline in the output. However, the standard way to do this is with tr

$ tr ',' '\n' < file

compare

$ echo 1,2 | awk 1 RS=,
1
2

$ echo 1,2 | tr ',' '\n'
1
2

Upvotes: 2

Related Questions