MacMama
MacMama

Reputation: 133

Multiple input files awk command line

I am an awk newbie and admittedly don't understand how the FNR NR drives looping through files. I'm able to get two input files working. I need to add another (inputFile3).

I am running this from the command line:

awk -f parseField.awk inputFile1.csv inputFile2.csv ./inputFile3.TXT

Currently, I loop through inputFile3 using:

FNR!=NR {...}

I loop through inputFile1 using:

FNR==NR {...}

I need to add another file to the mix (inputFile2). What is the syntax that I can use in my awk script (parseField) to access that third input file?

Upvotes: 1

Views: 1783

Answers (2)

Mark Setchell
Mark Setchell

Reputation: 207345

Not as elegant as the POSIX FILENAME solution, but handy for dusty, old awks that lack too many features. You can make a compound statement that manipulates your data before sending it to awk in a couple of ways...

Option 1

First, you could output the filenumber on its own before each file that you send to awk. So, if your files look like this:

file1

Line 1 of 1

file2

Line 1 of 2
Line 2 of 2

file3

Line 1 of 3
Line 2 of 3
Line 3 of 3

You could do this:

{ echo 1; cat file1; echo 2; cat file2; echo 3; cat file3; }
1
Line 1 of 1
2
Line 1 of 2
Line 2 of 2
3
Line 1 of 3
Line 2 of 3
Line 3 of 3

and pipe that into awk and then pick up the filenumber every time the number of fields is 1

{ echo 1; cat file1; echo 2; cat file2; echo 3; cat file3; } | awk 'NF==1{file=$1;next} {print file,$0}'
1 Line 1 of 1
2 Line 1 of 2
2 Line 2 of 2
3 Line 1 of 3
3 Line 2 of 3
3 Line 3 of 3

Option 2

Or, you could edit the filenumber onto the start, or end, of every line so it is available as $1 inside awk, like this:

{ sed 's/^/1 /' file1; sed 's/^/2 /' file2; sed 's/^/3 /' file3; }
1 Line 1 of 1
2 Line 1 of 2
2 Line 2 of 2
3 Line 1 of 3
3 Line 2 of 3
3 Line 3 of 3

So, now you can do

{ sed 's/^/1 /' file1; sed 's/^/2 /' file2; sed 's/^/3 /' file3; } | awk '{file=$1; ...}'

I'm still voting for @fedorqui's solution though :-)

Upvotes: 1

fedorqui
fedorqui

Reputation: 289495

To add to @EtanReisner 's good information, you can keep a counter: FNR==1 {file_number++}. This will increase the counter whenever the first line of a file is read.

All together, you can say:

#!/bin/awk -f

BEGIN {print "start program"}
NR==1 {print "reading first file"}
FNR==1 {filenum++; print "I am in file number", filenum}
{ ... }

If you are in a GNU POSIX awk (thanks Jonathan Leffler) you can also use the FILENAME variable. Or also the ARGC variables and ARGV array.


Also see information about this in Idiomatic awk:

Another construct that is often used in awk is as follows:

$ awk 'NR == FNR { # some actions; next} # other condition {# other actions}' file1.txt file2.txt

This is used when processing two files. When processing more than one file, awk reads each file sequentially, one after another, in the order they are specified on the command line. The special variable NR stores the total number of input records read so far, regardless of how many files have been read. The value of NR starts at 1 and always increases until the program terminates. Another variable, FNR, stores the number of records read from the current file being processed. The value of FNR starts at 1, increases until the end of the current file is reached, then is set again to 1 as soon as the first line of the next file is read, and so on. So, the condition NR == FNR is only true while awk is reading the first file.

Upvotes: 4

Related Questions