Reputation: 109
I am trying to understand gawk
in shell scripting. The command below is trying to count the number of paragraphs based on two (or more) consecutive new lines marking the end of a paragraph.
gawk 'END{print "Number of paragraphs: "NR}' RS="" tmp.txt
How does it work?
Upvotes: 1
Views: 694
Reputation: 753455
The GNU awk
manual says of RS:
The empty string
""
(a string without any characters) has a special meaning as the value of RS. It means that records are separated by one or more blank lines and nothing else.
So, your program can be parsed as follows:
gawk 'END{print "Number of paragraphs: "NR}' RS="" tmp.txt
gawk
command.gawk
script is END{print "Number of paragraphs: "NR}
(the single quotes are removed by the shell). When the input is ended, it prints the value of NR preceded by a phrase. NR is the number of records read. Note that this is using the implicit concatenation operator between the phrase and NR. It could also be written print "Number of paragraphs:", NR
and it would produce the same result.RS=""
is actually seen by gawk
as RS=
(the double quotes are removed by the shell). This sets the special mode referenced from the manual. Here, two or more consecutive newlines will be counted as the end of a paragraph, as will EOF.tmp.txt
.So, the command works because of a special case built into gawk
.
Everything in this discussion also applies to standard awk
.
Upvotes: 3