Reputation: 2214
Given the following file:
>tna
ATGC
>ggf
TG
>gta
TGGTT
I want to find the longest line that does not start with >
. I've figured out the AWK code:
awk '!/>/ {len=length($0)} len > maximum {maximum=len} END{print maximum}' file
I cannot explain, however, how maximum
is used before being defined. First, I make the comparison to len
and then I set maximum
to len
. How does AWK know what maximum
is?
Thanks!
Upvotes: 2
Views: 75
Reputation: 881113
By default, any string variables that are uninitialised are the empty string ""
and any numerics are zero (they're actually all empty strings but those are treated as zero in a numeric context).
But you might also want to consider the fact that />/
will match all lines containing >
, not starting with it. You would be better off with something like (readable):
!/^>/ {
len = length($0)
if (len > max) {
max = len
}
}
END {
print max
}
or, in its minimalist form:
!/^>/{len=length($0);if(len>max){max=len}}END{print max}
Upvotes: 5
Reputation: 85530
Awk is a dynamically typed language, their types change from "untyped" to string or number depending on the context it is used on. By default, variables are initialized to the empty string, which if used in integer context will be value zero. So you are essentially comparing against zero value of maximum
.
See 6.1.3.1 Using Variables in a Program and 6.3.2.1 String Type versus Numeric Type
Upvotes: 3
Reputation: 133428
Could you please try following, written and tested with shown samples in GNU awk
.
awk '!/^>/{len=length($0);max=(len>max?len:max)} END{print max}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!/^>/{ ##Checking condition if line does not starts from > then do following.
len=length($0) ##Creating len which has length of current line.
max=(len>max?len:max) ##Creating max which has either current line length OR max previous value depending upon which is greater.
}
END{ ##Starting END block of this program from here.
print max ##Printing max value here.
}
' Input_file ##Mentioning Input_file name here.
Upvotes: 2