Reputation: 575
I'm hacking some AWK. I'm a beginner with it. I have done my homework on the following problem, and just can't get it to work.
Start Date 12/3/17
End Date 12/30/17
Report Type Report1
Currency ZAR
Country Identifier MType Quantity Net Net Net Code Title Contrib I_Type M_Type Vendor Identifier Offline Indicator LSN
ZA 44057330 FMP 1 0.050666 0.050666 USYYYYYYYYYY ABC Tom 1 1 USYYYYYYYYYY 0 SUT
ZA 1267456726 SIMT 1 0.03 0.03 USXXXXXXXXXX DEF Frances 1 1 USXXXXXXXXXX 0 XYZ
Row Count 657
Storefront Name MType Quantity Net Net
ZA FMP 601 30.45
ZA IAP 13 0.68
ZA IMP 1035 69.36
ZA SIMP 54 1.4
ZA FMT 70 0.53
ZA IMT 92 1.68
ZA SIMT 6 0.18
(I left the special characters un-escaped here.)
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "NetNet" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX" "DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"
Basically I just need to get most of the header from line 5, but three fields I need are in lines 1-4. Also, I don't need the data including and after the line that starts with "Row Count".
gawk '
function basename(file) {
sub(".*/", "", file)
return file
}
/^Row Count/ {nextfile}
FNR == 1 { StartDate=$2; }
FNR == 2 { EndDate=$2; }
FNR == 4 { curr=$2; }
NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print}
FNR > 5 {$0 = StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt
Thanks!
These are the lines before the field headers in every file. Content begins on line 4:
Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,
It almost works. But it includes lines 1-3 for every file: gawk ' function basename(file) { sub(".*/", "", file) return file } BEGIN { FS=OFS="," } NR < 3 { if ( NR == 2 ) { hdr = "Report_Period" OFS val = val $1 OFS } next } FNR>3 { print "Filename", hdr $0 next } { print basename(FILENAME), val $0 } ' OFS="," /path/to/input/files > ~/path/to/output/file/file.csv
End of edit
Upvotes: 1
Views: 108
Reputation: 203502
Your sample input format isn't clear but this might be what you're looking for or it might be doing more than necessary or something else entirely:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
fname = FILENAME
sub(/.*[/]/,"",fname)
}
{
gsub(/[\\]t/,FS)
gsub(/[\\][/]/,"/")
gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
if ( FNR != 3 ) {
hdr = hdr $1 OFS
val = val $2 OFS
}
next
}
FNR==5 {
print "\"Filename\"", hdr $0
next
}
{ print "\""fname"\"", val $0 }
$ awk -f tst.awk file
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "Net Net" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX""DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"
The above uses GNU awk for nextfile, which you were already using.
Upvotes: 4