merlin2011
merlin2011

Reputation: 75575

What does the 0 mean in awk '/matched/,0' file?

I recently came across the following answer which was very useful but gave no context about why the commands worked.

awk '/matched/,0' file

What does the 0 mean in the context of this awk command?

To expand on this, I'd like to understand if the literal string 0 has some special meaning in the context of the comma operator, and whether it has special meaning in other places in awk.

For example, awk '/matched/,1' file seems to have the same behavior as awk '/matched/' file, which is just to match lines that have matched in them.

The documentation I found seems to make no mention of 0 when used as a substitute for a pattern.

Upvotes: 1

Views: 540

Answers (2)

Ed Morton
Ed Morton

Reputation: 203712

<condition 1>,<condition 2> in awk and other tools is a "range expression" which means "match the set of lines starting when condition 1 is true and ending when condition 2 is true".

0 is a false condition so it's never true so that block of lines continues til the end of the file.

Your specific range expression is:

/matched/,0

which is awk shorthand for:

match the lines starting when the condition $0 ~ /matched/ is true and ending when the condition 0 is true (i.e. never so the end of the file).

Don't ever use range expressions, they make trivial tasks slightly briefer than using a flag but then anything slightly more interesting requires a complete rewrite or duplicate conditions. See Is a /start/,/end/ range expression ever useful in awk? for details.

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133545

Though Ed Sir has explained it well about condition is FALSE, adding it with an example here.

Let's say we have following Input_file:

cat Input_file
test test test test test
test test test test test
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

Now we will try OP's given command:

awk '/>Cluster 145/,0' Input_file
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

Now to make it more sense lets intentionally provide a FALSE condition which never gets TRUE in whole Input_file for example (where this is checking from a line which has string />Cluster 145/ to singh but later string is never existing in Input_file:

awk '/>Cluster 145/,/singh/' Input_file
>Cluster 145
0       4772nt, >CL1798.Contig5_All... at +/98.49%
1       4782nt, >CL1798.Contig8_All... *
2       4781nt, >CL1798.Contig10_All... at +/99.27%
3       4773nt, >CL1798.Contig11_All... at +/99.25%

And we are seeing the same result what we got during mentioning 0 at end of the condition. So hence 0 means we are making condition FALSE which is never getting matched till END of Input_file and thus whole Input_file itself printing.



From gawk documentation: See complete part in documentation about Range specification. For turning off the RANGE a pattern should be matched which is never happening in case of /matced,0, see highlighted.

awk '$1 == "on", $1 == "off"' myfile

prints every record in myfile between ‘on’/‘off’ pairs, inclusive. A range pattern starts out by matching begpat against every input record. When a record matches begpat, the range pattern is turned on, and the range pattern matches this record as well. As long as the range pattern stays turned on, it automatically matches every input record read. The range pattern also matches endpat against every input record; when this succeeds, the range pattern is turned off again for the following record. Then the range pattern goes back to checking begpat against each record.

Upvotes: 1

Related Questions