Reputation: 43
I have the following code which runs on multiple tab delimited text files. It sums the amount for a particular field and counts the number of records per file (processing multiple files). Output is Filename,Sum of Field, Count of records - per file. All works well. Only issue is that instead of getting the count of records per file, I am getting a cumulative count of the whole batch of files that is processed. How can I fix this? I tried replacing 'NR' with 'FNR'. That also didn't work.
I am calling awk through a .bat file
awk -f SumColumnRecordCount.awk *.txt
This is the code in the awk file
BEGIN { FS="\t" }
{ sum[FILENAME] += $42 }
END {
for (i=1;i<ARGC;i++)
printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],NR >>"output.txt"
}
Running the .bat file in Windows 7 with GAWK (GNU AWK?)
Upvotes: 1
Views: 293
Reputation: 9936
Try a quick adaptation of your code (should work also with GNU<4 and non-GNU awks):
BEGIN { FS="\t" }
{ sum[FILENAME] += $42
last[FILENAME] = FNR }
END {
for (i=1;i<ARGC;i++)
printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],last[ARGV[i]] >>"output.txt"
}
awk version without arrays (should also work with GNU < 4 and non-GNU awks):
BEGIN {
FS="\t"
}
function pr() {
printf "%s %15d %d\n", f, sum, last >>"output.txt"
}
FNR==1 {
if(NR>1) pr()
sum=last=0
f=FILENAME
}
{
sum+=$42
last++
}
END {
pr()
}
--edit-- If one or more input files are completely empty, the second version will not print 0 with the filename.. (thanks @edmorton)
Upvotes: 1
Reputation: 45646
If you have GNU awk 4
then you can use BEGINFILE/ENDFILE
to achieve this:
BEGINFILE { sum = 0; FS="\t" }
{ sum += $42 }
ENDFILE { printf "%s %15d %d\n", FILENAME, sum, NR }
Upvotes: 1
Reputation: 54532
GNU AWK gives you access to the ENDFILE function. Therefore, all you really need is:
BEGIN {
FS="\t"
}
{
sum += $42
}
ENDFILE {
printf "%s %15d %d\n", FILENAME, sum, FNR > "output.txt"
sum = 0
}
Upvotes: 4