Reputation: 75575
I recently came across the following answer which was very useful but gave no context about why the commands worked.
awk '/matched/,0' file
What does the 0
mean in the context of this awk
command?
To expand on this, I'd like to understand if the literal string 0
has some special meaning in the context of the comma operator, and whether it has special meaning in other places in awk
.
For example, awk '/matched/,1' file
seems to have the same behavior as awk '/matched/' file
, which is just to match lines that have matched
in them.
The documentation I found seems to make no mention of 0
when used as a substitute for a pattern.
Upvotes: 1
Views: 540
Reputation: 203712
<condition 1>,<condition 2>
in awk and other tools is a "range expression" which means "match the set of lines starting when condition 1
is true and ending when condition 2
is true".
0
is a false condition so it's never true so that block of lines continues til the end of the file.
Your specific range expression is:
/matched/,0
which is awk shorthand for:
match the lines starting when the condition $0 ~ /matched/
is true and ending when the condition 0
is true (i.e. never so the end of the file).
Don't ever use range expressions, they make trivial tasks slightly briefer than using a flag but then anything slightly more interesting requires a complete rewrite or duplicate conditions. See Is a /start/,/end/ range expression ever useful in awk? for details.
Upvotes: 4
Reputation: 133545
Though Ed Sir has explained it well about condition is FALSE, adding it with an example here.
Let's say we have following Input_file:
cat Input_file
test test test test test
test test test test test
>Cluster 145
0 4772nt, >CL1798.Contig5_All... at +/98.49%
1 4782nt, >CL1798.Contig8_All... *
2 4781nt, >CL1798.Contig10_All... at +/99.27%
3 4773nt, >CL1798.Contig11_All... at +/99.25%
Now we will try OP's given command:
awk '/>Cluster 145/,0' Input_file
>Cluster 145
0 4772nt, >CL1798.Contig5_All... at +/98.49%
1 4782nt, >CL1798.Contig8_All... *
2 4781nt, >CL1798.Contig10_All... at +/99.27%
3 4773nt, >CL1798.Contig11_All... at +/99.25%
Now to make it more sense lets intentionally provide a FALSE condition which never gets TRUE in whole Input_file for example (where this is checking from a line which has string />Cluster 145/
to singh
but later string is never existing in Input_file:
awk '/>Cluster 145/,/singh/' Input_file
>Cluster 145
0 4772nt, >CL1798.Contig5_All... at +/98.49%
1 4782nt, >CL1798.Contig8_All... *
2 4781nt, >CL1798.Contig10_All... at +/99.27%
3 4773nt, >CL1798.Contig11_All... at +/99.25%
And we are seeing the same result what we got during mentioning 0
at end of the condition. So hence 0
means we are making condition FALSE which is never getting matched till END of Input_file and thus whole Input_file itself printing.
From gawk
documentation: See complete part in documentation about Range specification. For turning off the RANGE a pattern should be matched which is never happening in case of /matced,0
, see highlighted.
awk '$1 == "on", $1 == "off"' myfile
prints every record in myfile between ‘on’/‘off’ pairs, inclusive. A range pattern starts out by matching begpat against every input record. When a record matches begpat, the range pattern is turned on, and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches endpat against every input record; when this succeeds, the range pattern is turned off again for the following record. Then the range pattern goes back to checking begpat against each record.
Upvotes: 1