user2473726
user2473726

Reputation: 43

Return number of records per file after processing multiple files

I have the following code which runs on multiple tab delimited text files. It sums the amount for a particular field and counts the number of records per file (processing multiple files). Output is Filename,Sum of Field, Count of records - per file. All works well. Only issue is that instead of getting the count of records per file, I am getting a cumulative count of the whole batch of files that is processed. How can I fix this? I tried replacing 'NR' with 'FNR'. That also didn't work.

I am calling awk through a .bat file

awk -f SumColumnRecordCount.awk *.txt

This is the code in the awk file

BEGIN { FS="\t" }
{ sum[FILENAME] += $42 }
END {
    for (i=1;i<ARGC;i++)
    printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],NR >>"output.txt"
}

Running the .bat file in Windows 7 with GAWK (GNU AWK?)

Upvotes: 1

Views: 293

Answers (3)

Scrutinizer
Scrutinizer

Reputation: 9936

Try a quick adaptation of your code (should work also with GNU<4 and non-GNU awks):

BEGIN { FS="\t" }
{ sum[FILENAME] += $42
  last[FILENAME] = FNR }
END {
    for (i=1;i<ARGC;i++)
    printf "%s %15d %d\n",ARGV[i],sum[ARGV[i]],last[ARGV[i]] >>"output.txt"
}

awk version without arrays (should also work with GNU < 4 and non-GNU awks):

BEGIN { 
  FS="\t" 
}
function pr() {
  printf "%s %15d %d\n", f, sum, last >>"output.txt"
}
FNR==1 { 
  if(NR>1) pr()
  sum=last=0
  f=FILENAME
}
{ 
  sum+=$42
  last++
}
END {
  pr()
}

--edit-- If one or more input files are completely empty, the second version will not print 0 with the filename.. (thanks @edmorton)

Upvotes: 1

Adrian Fr&#252;hwirth
Adrian Fr&#252;hwirth

Reputation: 45646

If you have GNU awk 4 then you can use BEGINFILE/ENDFILE to achieve this:

BEGINFILE { sum = 0; FS="\t" }
          { sum += $42 }
ENDFILE   { printf "%s %15d %d\n", FILENAME, sum, NR }

Upvotes: 1

Steve
Steve

Reputation: 54532

GNU AWK gives you access to the ENDFILE function. Therefore, all you really need is:

BEGIN {

    FS="\t"
}

{
    sum += $42
}

ENDFILE {

    printf "%s %15d %d\n", FILENAME, sum, FNR > "output.txt"

    sum = 0
}

Upvotes: 4

Related Questions