AWK - understanding different outputs obtained using FILENAME and OFS

Question

Input File 1: clnd.csv

20180805,08/05/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,01,SUNDAY
20180812,08/12/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,01,SUNDAY
20180819,08/19/2018,w29_2018,WK29 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,29,WEEK 29,01,SUNDAY
20180826,08/26/2018,w30_2018,WK30 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,30,WEEK 30,01,SUNDAY

Input File 2: data.csv

w27_2018,257,1,26.20,0.00,24.26
w28_2018,257,1,7.97,0.00,24.26
w29_2018,257,1,34.86,0.00,24.26
w30_2018,257,1,3.29,0.00,24.26

GNU-Awk commands used:

awk -F, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[ARGV[2],$1]; } 1' OFS=, clnd.csv data.csv  >> my_report_1.csv
awk -F, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[FILENAME,$1]; } 1' OFS=, clnd.csv data.csv  >> my_report_2.csv
awk -F, -v OFS=, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[FILENAME,$1]; } 1' clnd.csv data.csv  >> my_report_3.csv

Output obtained: cat my_report_?.csv

==> my_report_1.csv <==

2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w28_2018,WK28 2018,257,1,7.97,0.00,24.26
2018,w29_2018,WK29 2018,257,1,34.86,0.00,24.26
2018,w30_2018,WK30 2018,257,1,3.29,0.00,24.26

==> my_report_2.csv <==

,257,1,26.20,0.00,24.26
,257,1,7.97,0.00,24.26
,257,1,34.86,0.00,24.26
,257,1,3.29,0.00,24.26

==> my_report_3.csv <==

2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w28_2018,WK28 2018,257,1,7.97,0.00,24.26
2018,w29_2018,WK29 2018,257,1,34.86,0.00,24.26
2018,w30_2018,WK30 2018,257,1,3.29,0.00,24.26

Can you please explain why these outputs are different? My understanding was FILENAME will hold the name of the file being read and setting OFS in the beginning and end, like I have done, shouldn't make any difference as either way it should be set before any record is being read. Thanks in advance!

P.S: I am using GNU Awk 3.1.7 on Oracle Linux Server release 6.10. Expected output is as it appears in my_report_1.csv and my_report_3.csv

jas · Accepted Answer

The difference is that you are changing the numbering of the arguments by specifying OFS as a trailing argument to awk instead of using the -v option.

In particular this is a problem for your second example because you are depending on the fact that argv[2] has the same value as FILENAME while processing the second file.

You can see the difference here:

$ gawk -v OFS=, -f a.awk a b
    ARGV[0] = gawk
    ARGV[1] = a
    ARGV[2] = b

$ gawk -f a.awk OFS=, a b
    ARGV[0] = gawk
    ARGV[1] = OFS=,
    ARGV[2] = a
    ARGV[3] = b

AWK - understanding different outputs obtained using FILENAME and OFS

Answers (1)

Related Questions