Reputation: 290105
I answered a question in which I used a little awk
trick to convert commas into new lines:
awk 1 RS=, file
However, I then noticed that this is introducing an extra new line to the end of the output:
$ cat a
1,2
$ awk 1 RS=, a
1
2
# one extra line
$ awk 1 RS=, <<< "1,2"
1
2
# one extra line
Since 1
is short-hand for {print $0}
I decided to see what is going on:
$ awk '{print $0, "hey"}' RS=, <<< "1,2"
1 hey
2
hey
So yes, apparently the splitting is done, but for some reason the second record consists in 2
followed by a new line. And yes, awk
just sees two records:
$ awk '{print NR}' RS=, <<< "1,2"
1
2
To me this made sense, since echo
and here-strings add such new line to the end of their output, while printf
does not. And effectively it works fine with printf:
$ awk '{print $0, "hey"}' RS=, < <(printf "1,2")
1 hey
2 hey # no more lines after this
OK, I said: then it is just a problem on the new line that gets appended to the end of the string.
But then... I see that it is not always the case and my confusion gets bigger:
$ awk '{print $0, "hey"}' <<< "1,2"
1,2 hey # no more lines after this
So my question is: what is RS=,
doing to cause this extra new line to be appended?
Upvotes: 2
Views: 431
Reputation: 204164
It's not awk that's adding a newline, it's <<<
. If the shell didn't add a terminating newline to the end of the text you specify using <<<
then the result would not be a text "file" per POSIX and so would rely on undefined behavior from any tool trying to parse it.
So when you write command <<< 'foo'
what command
sees is not foo
, it's foo\n
and therefore in your command line:
awk 1 RS=, <<< "1,2"
what awk sees is 1,2\n
and when you split that into records at the ,
you get the first record of 1
and the 2nd record of 2\n
.
Upvotes: 2
Reputation: 74685
Awk processes each record, automatically removing the record separator from the end. If you've changed it to something other than a newline, this means that it won't be removed, so you end up with this behaviour.
Your "record count" is 2 even though you only have one ,
but it's also 2 in this example (which hopefully doesn't make this even more confusing!):
$ printf 'a\nb' | awk '{print NR}'
1
2
Upvotes: 2
Reputation: 67507
it's the newline in the input stream.
$ awk 1 RS=, < <(echo -n 1,2)
1
2
won't have the extra newline in the output. However, the standard way to do this is with tr
$ tr ',' '\n' < file
compare
$ echo 1,2 | awk 1 RS=,
1
2
$ echo 1,2 | tr ',' '\n'
1
2
Upvotes: 2