Steve
Steve

Reputation: 575

AWK to Consolidate Files

I'm hacking some AWK. I'm a beginner with it. I have done my homework on the following problem, and just can't get it to work.

RAW DATA SAMPLE:

Start Date  12/3/17
End Date    12/30/17
Report Type Report1
Currency    ZAR
Country Identifier  MType   Quantity    Net Net Net Code    Title   Contrib I_Type  M_Type  Vendor Identifier   Offline Indicator   LSN
ZA  44057330    FMP 1   0.050666    0.050666    USYYYYYYYYYY    ABC Tom 1   1   USYYYYYYYYYY    0   SUT
ZA  1267456726  SIMT    1   0.03    0.03    USXXXXXXXXXX    DEF Frances 1   1   USXXXXXXXXXX    0   XYZ
Row Count   657
Storefront Name MType   Quantity    Net Net
ZA  FMP 601 30.45
ZA  IAP 13  0.68
ZA  IMP 1035    69.36
ZA  SIMP    54  1.4
ZA  FMT 70  0.53
ZA  IMT 92  1.68
ZA  SIMT    6   0.18

DESIRED OUTPUT:

(I left the special characters un-escaped here.)

"Filename"  "Start Date"    "End Date"  "Currency"  "Country"   "Identifier"    "MType" "Quantity"  "Net"   "NetNet"    "Code"  "Title" "Contrib"   "I_Type"    "M_Type"    "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "44057330"  "FMP"   "1" "0.050666"  "0.050666"  "USYYYYYYYYYY"  "ABC"   "Tom"   "1" "1" "USYYYYYYYYYY"  "0" "SUT"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "1267456726"    "SIMT"  "1" "0.03"  "0.03"  "USXXXXXXXXXX"  "DEF"   "Frances"   "1" "1" "USXXXXXXXXXX"  "0" "XYZ"

Basically I just need to get most of the header from line 5, but three fields I need are in lines 1-4. Also, I don't need the data including and after the line that starts with "Row Count".

MY BEST "GUESS" SO FAR:

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
  /^Row Count/ {nextfile}
  FNR == 1 { StartDate=$2; }
  FNR == 2 { EndDate=$2; }
  FNR == 4 { curr=$2; }
  NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print} 
  FNR > 5 {$0 =  StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt

Thanks!

EDITS:

NEW TABLE

These are the lines before the field headers in every file. Content begins on line 4:

Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,

"MY" SCRIPT

It almost works. But it includes lines 1-3 for every file: gawk ' function basename(file) { sub(".*/", "", file) return file } BEGIN { FS=OFS="," } NR < 3 { if ( NR == 2 ) { hdr = "Report_Period" OFS val = val $1 OFS } next } FNR>3 { print "Filename", hdr $0 next } { print basename(FILENAME), val $0 } ' OFS="," /path/to/input/files > ~/path/to/output/file/file.csv

End of edit

Upvotes: 1

Views: 108

Answers (1)

Ed Morton
Ed Morton

Reputation: 203502

Your sample input format isn't clear but this might be what you're looking for or it might be doing more than necessary or something else entirely:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
    fname = FILENAME
    sub(/.*[/]/,"",fname)
}
{
    gsub(/[\\]t/,FS)
    gsub(/[\\][/]/,"/")
    gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
    if ( FNR != 3 ) {
        hdr = hdr $1 OFS
        val = val $2 OFS
    }
    next
}
FNR==5 {
    print "\"Filename\"", hdr $0
    next
}
{ print "\""fname"\"", val $0 }

$ awk -f tst.awk file
"Filename"      "Start Date"    "End Date"      "Currency"      "Country"       "Identifier"    "MType" "Quantity"   "Net"    "Net Net"       "Code"  "Title" "Contrib"       "I_Type"        "M_Type"        "Vendor Identifier"     "Offline Indicator"   "LSN"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "44057330"      "FMP"   "1"     "0.050666"      "0.050666"   "USYYYYYYYYYY"   "ABC"   "Tom"   "1"     "1"     "USYYYYYYYYYY"  "0"     "SUT"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "1267456726"    "SIMT"  "1"     "0.03"  "0.03"  "USXXXXXXXXXX""DEF"   "Frances"       "1"     "1"     "USXXXXXXXXXX"  "0"     "XYZ"

The above uses GNU awk for nextfile, which you were already using.

Upvotes: 4

Related Questions