Jeff
Jeff

Reputation: 15

Using AWK to parse fields with commas

Edited - TLDR: Using awk to parse fields that include commas.

# original config file - confile1
$ cat confile1
list=(
app1,"HOSTNAME - port - application name - alert1",99.0,99.0
app2,"HOSTNAME - port - application name - alert1",99.0,99.0
app3,"HOSTNAME - port - service name - alert2",99.0,99.0
web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0
)
# original script - test1
$ cat test1
#!/bin/bash

IFS="$(printf '\n\t')"

function parse
{
for item in ${list[*]}
do
  group=$(echo $item | awk -F, '{print $1}')
  monitor=$(echo $item | awk -F, '{print $2}')
  grp_sla=$(echo $item | awk -F, '{print $3}')
  mon_sla=$(echo $item | awk -F, '{print $4}')
  echo $group
  echo $monitor
  echo $grp_sla
  echo $mon_sla
done
}

. confile1
parse

Notice the last line of confile1 gets butchered since it has a comma in the 2nd field

  $ ./test1
    app1
    HOSTNAME - port - application name - alert1
    99.0
    99.0
    app2
    HOSTNAME - port - application name - alert1
    99.0
    99.0
    app3
    HOSTNAME - port - service name - alert2
    99.0
    99.0
    web1
    URL - HOSTNAMES(01
    02) - http://someurl.com/ - alert1
    99.0

Upvotes: 1

Views: 283

Answers (3)

Jeff
Jeff

Reputation: 15

Ed Morton provided the exact info I needed. I already tested this on my main script and it is parsing perfectly! Here is the test code working:

$ awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $0}' confile4
app1,"HOSTNAME - port - application name - alert1",99.0,99.0
app2,"HOSTNAME - port - application name - alert1",99.0,99.0
app3,"HOSTNAME - port - service name - alert2",99.0,99.0
web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0


$ cat test10
#!/bin/bash

IFS="$(printf '\n\t')"

function parse
{
for item in $(awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $0}' confile4)
do
  group=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $1}')
  monitor=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $2}')
  grp_sla=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $3}')
  mon_sla=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $4}')
  echo $group
  echo $monitor
  echo $grp_sla
  echo $mon_sla
done
}

parse

$ ./test10
app1
"HOSTNAME - port - application name - alert1"
99.0
99.0
app2
"HOSTNAME - port - application name - alert1"
99.0
99.0
app3
"HOSTNAME - port - service name - alert2"
99.0
99.0
web1
"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1"
99.0
99.0

Upvotes: 0

karakfa
karakfa

Reputation: 67467

another solution for your specific setting is using the NF to control fields. Here I'm setting OFS to make it more visible.

$ awk -F, 'BEGIN{OFS=" <-> "} NF==4{print $1, $2, $3, $4 } NF==5{print $1, $2","$3, $4, $5}' data.csv

app1 <-> "HOSTNAME - port - application name - alert1" <-> 99.0 <-> 99.0
app2 <-> "HOSTNAME - port - application name - alert1" <-> 99.0 <-> 99.0
app3 <-> "HOSTNAME - port - service name - alert2" <-> 99.0 <-> 99.0
web1 <-> "URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1" <-> 99.0 <-> 99.0

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203219

I'm not willing to wade through your whole question (sorry, IMHO it's just too long with too much extraneous information) but it looks like you're trying to extract the individual fields from that "confile1" at the top of your question so maybe this is all the hint you need:

$ cat tst.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" }
NF>1 {
    print "\nRecord", ++nr":", $0
    for (i=1; i<=NF; i++) {
        print "   Field", i":", $i
    }
}

$ awk -f tst.awk confile1

Record 1: app1,"HOSTNAME - port - application name - alert1",99.0,99.0
   Field 1: app1
   Field 2: "HOSTNAME - port - application name - alert1"
   Field 3: 99.0
   Field 4: 99.0

Record 2: app2,"HOSTNAME - port - application name - alert1",99.0,99.0
   Field 1: app2
   Field 2: "HOSTNAME - port - application name - alert1"
   Field 3: 99.0
   Field 4: 99.0

Record 3: app3,"HOSTNAME - port - service name - alert2",99.0,99.0
   Field 1: app3
   Field 2: "HOSTNAME - port - service name - alert2"
   Field 3: 99.0
   Field 4: 99.0

Record 4: web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0
   Field 1: web1
   Field 2: "URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1"
   Field 3: 99.0
   Field 4: 99.0

The above uses GNU awk for FPAT (see http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content).

Especially since you are teaching yourself, I strongly recommend you get the books Effective Awk Programming, 4th Edition, by Arnold Robbins and Shell Scripting Recipes by Chris Johnson as it's EXTREMELY easy to go down the wrong path in UNIX given all of the possible ways you can approach solving any one problem.

Upvotes: 2

Related Questions