Majeed
Majeed

Reputation: 77

How to convert my file in rows and column using shell scripting

I have a file in the format below. Can anyone convert it in columns? I have tried the awk command below but it creates more that 4 columns if one customer has multiple hostnames.

awk '/"customer_name":/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' filename

Input:

customer_name: "abc"
  "HostName": "tm-1"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-2"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-3"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "new-va-threat-01"
  "LastDayRxBytes": 0
  "Status": "offline"
customer_name: "xyz"
  "HostName": "tm-56"
  "LastDayRxBytes": 10708747
  "Status": "ok"
customer_name: "def"
customer_name: "uvw"
  "HostName": "tm-23"
  "LastDayRxBytes": 34921829912
  "Status": "ok"
customer_name: "new cust"
  "HostName": "tm-1-3"
  "LastDayRxBytes": 33993187093
  "Status": "ok"
customer_name: "a12 d32 ffg"
customer_name: "bcd abc"
customer_name: "mno opq"
customer_name: "abc dhg pvt ltd."
  "HostName": "tm-10"
  "LastDayRxBytes": 145774401010
  "Status": "ok"
  "HostName": "tm-ngtm-13"
  "LastDayRxBytes": 150159680874
  "Status": "ok"
  "HostName": "new-ngtm-11"
  "LastDayRxBytes": 207392526747
  "Status": "ok"
  "HostName": "old-ngtm-06"
  "LastDayRxBytes": 17708734533
  "Status": "ok"
  "HostName": "tm-08"
  "LastDayRxBytes": 559289251
  "Status": "ok"
  "HostName": "tm-12"
  "LastDayRxBytes": 534145552271
  "Status": "ok"

I want it to be printed in column and rows as:

Column 1               Column 2             Column 3             Column 4
CustName               Host                 Last RX              Status
abc                    tm-1                 0                    offline
abc                    tm-2                 0                    offline
abc                    tm-3                 0                    offline
abc                    new-va-threat-01     0                    offline
xyz                    tm-56                10708747             ok
def                    
uvw                    tm-23                34921829912          ok
new_cust               tm-1-3               33993187093          ok
a12 d32 ffg
acd abc
mno opq
abc dhg pvt ltd.       tm-10                145774401010         ok
abc dhg pvt ltd.       tm-ngtm-13           150159680874         ok
abc dhg pvt ltd.       new-ngtm-11          207392526747         ok
abc dhg pvt ltd.       old-ngtm-06          17708734533          ok
abc dhg pvt ltd.       tm-08                559289251            ok
abc dhg pvt ltd.       tm-12                534145552271         ok

Upvotes: 0

Views: 81

Answers (3)

Marc Lambrichs
Marc Lambrichs

Reputation: 2882

gnu awk solution:

$ cat tst.awk
BEGIN {
   RS="customer_name: "
   pr("Column1", "Column2", "Column3", "Column4")
   pr("Custname", "Host", "Last RX", "Status")
}
match($0, /"([^"]+)"/, cust) {
   printed=0
   str=substr($0, RLENGTH+2)
   while (match( str, /"HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s/, col)){
      str=substr(str, RLENGTH+3)
      pr( cust[1], col[1], col[2], col[3] )
      printed=1
   }
   if (!printed) pr(cust[1])
}
function pr(cust,host,rx,status) {
   printf "%-16s\t%-16s\t%-16s\t%-10s\n", cust, host, rx, status
}

Based on the example input, one can tackle this one using regexes and the match function as well. Testing it:

$ awk -f tst.awk input.txt
Column1             Column2             Column3             Column4
Custname            Host                Last RX             Status
abc                 tm-1                0                   offline
abc                 tm-2                0                   offline
abc                 tm-3                0                   offline
abc                 new-va-threat-01    0                   offline
xyz                 tm-56               10708747            ok
def
uvw                 tm-23               34921829912         ok
new cust            tm-1-3              33993187093         ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.    tm-10               145774401010        ok
abc dhg pvt ltd.    tm-ngtm-13          150159680874        ok
abc dhg pvt ltd.    new-ngtm-11         207392526747        ok
abc dhg pvt ltd.    old-ngtm-06         17708734533         ok
abc dhg pvt ltd.    tm-08               559289251           ok
abc dhg pvt ltd.    tm-12               534145552271        ok

Explanation:

  • record separator RS is set on customer_name:, so $0 contains all host, rx and status information per customer.
  • 1st match with regex "([^"]+)" will capture the customer
  • 2nd match with regex "HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s will capture hostname, rx and status.
  • if the 2nd match succeeds, shorten the string you want to use in your next match.

I know, this is not the awk way of doing things, but then again the regular format of the input allows this - quite concise - regex-based solution.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246744

I'd write this

awk -F": " -v OFS="\t" '
    BEGIN {print "CustName", "Host", "Last RX", "Status"}
    {
        gsub(/"/,"")
        sub(/^[[:blank:]]+/,"")
    }
    $1 == "customer_name" {
        if ("customer_name" in data && !have_data)
            print data["customer_name"]
        have_data = 0
    }
    {
        data[$1] = $2
    }
    ("HostName" in data) && ("LastDayRxBytes" in data) && ("Status" in data) {
        print data["customer_name"], data["HostName"], data["LastDayRxBytes"], data["Status"]
        delete data["HostName"]
        delete data["LastDayRxBytes"]
        delete data["Status"]
        have_data = 1
    }
' file | column -s $'\t' -t
CustName          Host              Last RX       Status
abc               tm-1              0             offline
abc               tm-2              0             offline
abc               tm-3              0             offline
abc               new-va-threat-01  0             offline
xyz               tm-56             10708747      ok
def
uvw               tm-23             34921829912   ok
new cust          tm-1-3            33993187093   ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.  tm-10             145774401010  ok
abc dhg pvt ltd.  tm-ngtm-13        150159680874  ok
abc dhg pvt ltd.  new-ngtm-11       207392526747  ok
abc dhg pvt ltd.  old-ngtm-06       17708734533   ok
abc dhg pvt ltd.  tm-08             559289251     ok
abc dhg pvt ltd.  tm-12             534145552271  ok

Upvotes: 1

choroba
choroba

Reputation: 241758

Perl to the rescue:

perl -lne '
    if (/customer_name: "(.*)"/) {
        print $h{name} unless $h{printed} || !%h;
        undef $h{printed} if $1 ne $h{name};
        $h{name} = $1;
    } else {
        /"([^"]+)": "?([^"]+)"?/ and $h{$1} = $2;
        $h{printed} = print join "\t",
            @h{qw{ name HostName LastDayRxBytes Status }}
            if "Status" eq $1;
    }
    END { print $h{name} unless $h{printed} || !%h }
    ' < input_file
  • The %h hash is used to gather information about lines to be printed.
  • When a customer name is read, the previous customer name is printed if it hasn't been printed yet. The same happens at the very end of the input to print a possible last customer with no details.
  • A line is printed when Status is read.

Upvotes: 0

Related Questions