saurav
saurav

Reputation: 219

Parsing data using awk

I have copied 3 columns data into a file separated by comma(,) and doing some string manipulation but dont know why empty '' is being added for every line-

My Input file-

"['3.FIT-DYN', '3.MYFIT-LTR-DYN']",FIT-L-PHY-PRM-GI,2014-07-11 14:07:28+0000
 ['1.839324'],4.FIDCWRRTL,2015-04-16 12:04:21+0000
              ,4.AIQM,2015-04-16 12:04:21+0000

If you see only 3rd line has an empty data at first place.

my awk-

     BEGIN { FS=",?\"?[][]\"?,?"; OFS="," }
     {
       if (split($2,a,/\047/)) {
          for (j=2; j in a; j+=2) {
           $2 = a[j]
           prt()
      }
     }
     else {
    prt()
      }
     }


    function prt(   out) {
      out = "\047" $0 "\047"
      gsub(OFS,"\047,\047",out)
      print out
     }

o/p-

  '','3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
  '','4.AIQM','2015-04-16 12:04:21+0000'

expected o/p-

    '3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
    '','4.AIQM','2015-04-16 12:04:21+0000'

Upvotes: 0

Views: 53

Answers (1)

Michael Vehrs
Michael Vehrs

Reputation: 3363

You are using square brackets as field separators. In your data, the first character is a square bracket, i.e. a field separator. This means that your first field is everything that precedes the first field separator, in this case, the empty string.

I suggest something else:

BEGIN { FS="]\"?,"; OFS="," }
NF == 1 { sub("^[[:blank:]]*", "''"); print; next }
{
    sub(" *\"?\\[", "", $1);
    count = split($1, fields, ", *");
    for (i = 1; i <= count; i++) {
        $1 = fields[i]
        print
    }
}

Upvotes: 1

Related Questions