Reputation: 680
Here are the contents of the file:
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
And these two things give the same result:
$ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents
$ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents
The result given in both cases is:
Person Name (222) 466-1234
Another person (523) 643-8754
RS="\n\n"
actually makes sense, but why is RS=""
also treated the same way?
Upvotes: 2
Views: 388
Reputation: 204124
They aren't treated the same.
RS=""
invokes paragraph mode in all awks and so the input is split into records separated by contiguous sequences of empty lines and a newline is added to the FS if the existing FS is a single character (note: the POSIX standard is incorrect in this area as it implies \n
would get added to any FS
but that's not the case, see https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html).RS="\n\n"
works in GNU awk to set the record separator to a single blank line and does not affect FS. In all other awks the 2nd \n
will be ignored (more than 1 char in a RS is undefined behavior per POSIX so they COULD do anything but that's by far the most common implementation).Look what happens when you have 3 blank lines between your 2 blocks of text and use a FS other than \n
(e.g. ,
):
$ cat file
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
.
$ gawk 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name
123 High Street
(222) 466-1234>
2 0 <>
3 1 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name>
2 1 <123 High Street>
3 1 <(222) 466-1234>
4 0 <>
5 0 <>
6 0 <>
7 1 <Another person>
8 1 <487 High Street>
9 1 <(523) 643-8754>
10 0 <>
Note the different values for NR
and NF
and different $0
contents being printed.
Upvotes: 5
Reputation: 50795
Because POSIX awk specification says so.
If
RS
is null, then records are separated by sequences consisting of a<newline>
plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a<newline>
shall always be a field separator, no matter what the value ofFS
is.
Upvotes: 2