Reputation: 192
Input File 1: clnd.csv
20180805,08/05/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,01,SUNDAY
20180812,08/12/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,01,SUNDAY
20180819,08/19/2018,w29_2018,WK29 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,29,WEEK 29,01,SUNDAY
20180826,08/26/2018,w30_2018,WK30 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,30,WEEK 30,01,SUNDAY
Input File 2: data.csv
w27_2018,257,1,26.20,0.00,24.26
w28_2018,257,1,7.97,0.00,24.26
w29_2018,257,1,34.86,0.00,24.26
w30_2018,257,1,3.29,0.00,24.26
GNU-Awk commands used:
awk -F, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[ARGV[2],$1]; } 1' OFS=, clnd.csv data.csv >> my_report_1.csv
awk -F, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[FILENAME,$1]; } 1' OFS=, clnd.csv data.csv >> my_report_2.csv
awk -F, -v OFS=, 'NR==FNR {y=substr($12,3,4); a[ARGV[2],$3]=y FS $3 FS $4; next} {$1=a[FILENAME,$1]; } 1' clnd.csv data.csv >> my_report_3.csv
Output obtained: cat my_report_?.csv
==> my_report_1.csv <==
2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w28_2018,WK28 2018,257,1,7.97,0.00,24.26
2018,w29_2018,WK29 2018,257,1,34.86,0.00,24.26
2018,w30_2018,WK30 2018,257,1,3.29,0.00,24.26
==> my_report_2.csv <==
,257,1,26.20,0.00,24.26
,257,1,7.97,0.00,24.26
,257,1,34.86,0.00,24.26
,257,1,3.29,0.00,24.26
==> my_report_3.csv <==
2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w28_2018,WK28 2018,257,1,7.97,0.00,24.26
2018,w29_2018,WK29 2018,257,1,34.86,0.00,24.26
2018,w30_2018,WK30 2018,257,1,3.29,0.00,24.26
Can you please explain why these outputs are different? My understanding was FILENAME will hold the name of the file being read and setting OFS in the beginning and end, like I have done, shouldn't make any difference as either way it should be set before any record is being read. Thanks in advance!
P.S: I am using GNU Awk 3.1.7 on Oracle Linux Server release 6.10. Expected output is as it appears in my_report_1.csv and my_report_3.csv
Upvotes: 1
Views: 47
Reputation: 10865
The difference is that you are changing the numbering of the arguments by specifying OFS
as a trailing argument to awk instead of using the -v
option.
In particular this is a problem for your second example because you are depending on the fact that argv[2]
has the same value as FILENAME
while processing the second file.
You can see the difference here:
$ gawk -v OFS=, -f a.awk a b
ARGV[0] = gawk
ARGV[1] = a
ARGV[2] = b
$ gawk -f a.awk OFS=, a b
ARGV[0] = gawk
ARGV[1] = OFS=,
ARGV[2] = a
ARGV[3] = b
Upvotes: 2