abalter
abalter

Reputation: 10383

How to get number of fields in AWK prior to processing

I would like to create a header for a file in the BEGIN part of my awk script, but to do that I need to know how many fields there are. I could put a check within the main section to check if NR==1 but that will get evaluated on each row, slowing things down.

Below is my attempt using a one-liner.

fields.txt

a   1
b   2
c   3

Result:

awk 'NR==1{a=NF; print "before begin, there are ", a, "fields"}BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt
there are   fields
before begin, there are  2 fields
2   a   1
2   b   2
2   c   3
there were 2 fields

I guess the BEGIN block still gets evaluated before the preceding block. Have I really accomplished my goal, or is the NR==1 check still getting evaluated on each line?

EDIT So just to put in perspective why I'm trying to do it the way I am

  1. I've got a file with say 100k rows and 40 columns
  2. This file is the output of another process in a pipeline, with the awk script being the last step
  3. I'm calculating two rows based on other rows and adding these to the output
  4. I want the final file to include a header that reflects the two new added columns

Upvotes: 0

Views: 859

Answers (2)

Ed Morton
Ed Morton

Reputation: 203349

It sounds like this is what you're trying to do:

awk '
  BEGIN {if ((getline < ARGV[1]) > 0) a=NF; print "there are", a, "fields"}
  {print a"\t"$0}
  END {print "there were", a, "fields"}
' file
there are 2 fields
2       a   1
2       b   2
2       c   3
there were 2 fields

but idk if it's worthwhile given the tiny performance impact of an NR==1 check relative to whatever other transformations you're going to perform on the data.

Make sure you read and fully understand all of the implications of using getline at http://awk.freeshell.org/AllAboutGetline if you're considering using it.

Upvotes: 3

JNevill
JNevill

Reputation: 50034

I'm not sure if awk doing the NR==1 check on each row would really slow it down much at all. If that really is a concern, then perhaps do your initial field count outside of your current awk script and send it into your awk script with a variable. Something like:

fieldCount=`head -1 fields.txt | awk '{print NF}'`
awk -v a="$fieldCount" 'BEGIN{print "there are ", a, "fields"}{print a"\t"$0}END{print "there were", a, "fields"}' fields.txt

Upvotes: 2

Related Questions