herophant
herophant

Reputation: 680

In awk, why are "" and "\n\n" treated the same for the RS parameter?

Here are the contents of the file:

Person Name
123 High Street
(222) 466-1234

Another person
487 High Street
(523) 643-8754

And these two things give the same result:

$ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents
$ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents

The result given in both cases is:

Person Name (222) 466-1234
Another person (523) 643-8754

RS="\n\n" actually makes sense, but why is RS="" also treated the same way?

Upvotes: 2

Views: 388

Answers (2)

Ed Morton
Ed Morton

Reputation: 204124

They aren't treated the same.

  • RS="" invokes paragraph mode in all awks and so the input is split into records separated by contiguous sequences of empty lines and a newline is added to the FS if the existing FS is a single character (note: the POSIX standard is incorrect in this area as it implies \n would get added to any FS but that's not the case, see https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html).
  • RS="\n\n" works in GNU awk to set the record separator to a single blank line and does not affect FS. In all other awks the 2nd \n will be ignored (more than 1 char in a RS is undefined behavior per POSIX so they COULD do anything but that's by far the most common implementation).

Look what happens when you have 3 blank lines between your 2 blocks of text and use a FS other than \n (e.g. ,):

$ cat file
Person Name
123 High Street
(222) 466-1234



Another person
487 High Street
(523) 643-8754

.

$ gawk 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>

2 3 <Another person
487 High Street
(523) 643-8754>

.

$ gawk --posix 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>

2 3 <Another person
487 High Street
(523) 643-8754>

.

$ gawk 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name
123 High Street
(222) 466-1234>

2 0 <>

3 1 <Another person
487 High Street
(523) 643-8754>

.

$ gawk --posix 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name>

2 1 <123 High Street>

3 1 <(222) 466-1234>

4 0 <>

5 0 <>

6 0 <>

7 1 <Another person>

8 1 <487 High Street>

9 1 <(523) 643-8754>

10 0 <>

Note the different values for NR and NF and different $0 contents being printed.

Upvotes: 5

oguz ismail
oguz ismail

Reputation: 50795

Because POSIX awk specification says so.

If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.

Upvotes: 2

Related Questions