Reputation: 735
How can I count the number of times a digit within a given range of numbers in a field occurs?
For example, the raw text foo.txt
is shown below:
2,3,4,2,4
2,3,4,32,4
2,3,4,12,4
2,3,4,4,4
2,3,4,,4
2,3,4,15,4
2,3,4,15,4
I want to count the number of times a digit in field #4 falls between the following ranges: [0,10) and [10,20), where the lower bound is inclusive and the upper bound is not.
The result should be:
range 0-10: 2 range 10-20: 3
Here is my awk code below, but I am getting 8600001 for both ranges,
awk -f prog.awk foo.txt
:
#!/usr/range/awk
# prog.awk
BEGIN {
FS=",";
$range1=0;
$range2=0;
}
$4 ~ /[0-9]/ && $4 >= 0 && $4 < 10 { $range1 += 1 };
$4 ~ /[0-9]/ && $4 >= 10 && $4 < 20 { $range2 += 1 };
END {
print $range1, "\t", $range2;
}
Upvotes: 0
Views: 790
Reputation: 67547
another awk
$ awk -F, '$4>=0{a[int($4/10)]++}
END{print "range 0-10:" a[0],"range 10-20:" a[1]}' file
range 0-10:2 range 10-20:3
can be easily expanded to cover the full range
$ awk -F, '$4>=0{a[int($4/10)]++}
END{for(k in a) print "range ["k*10"-"(k+1)*10"):", a[k]}' file
range [0-10): 2
range [10-20): 3
range [30-40): 1
Upvotes: 3
Reputation: 113944
$ awk -F, '0<=$4 && $4<10{a++} 10<=$4 && $4<20{b++} END{printf "range 0-10: %i range 10-20: %i\n",a,b}' foo.txt
range 0-10: 2 range 10-20: 3
0<=$4 && $4<10{a++}
This counts every time the fourth field is in [0,10).
10<=$4 && $4<20{b++}
This counts every time the fourth field is in [10,20).
END{printf "range 0-10: %i range 10-20: %i\n",a,b}
After we have finished reading the file, this prints out the results in the desired format.
For those who prefer their code spread over multiple lines:
awk -F, '
0<=$4 && $4<10 {
a++
}
10<=$4 && $4<20{
b++
}
END{
printf "range 0-10: %i range 10-20: %i\n", a, b
}
' foo.txt
In awk, $range1
is the value of field whose number is range1
. This is not what you want. If you are not referencing a field number, do not use $
. Thus:
BEGIN {
FS=",";
range1=0;
range2=0;
}
$4 ~ /[0-9]/ && $4 >= 0 && $4 < 10 { range1 += 1 };
$4 ~ /[0-9]/ && $4 >= 10 && $4 < 20 { range2 += 1 };
END {
print range1, "\t", range2;
}
Note that initializing the range variables to zero is not necessary: zero is the default value for a numeric variable.
Upvotes: 3