CodeDevotion
CodeDevotion

Reputation: 307

How to convert space separated key value data into CSV format in bash?

I am working on some data files where data is of key and value pairs that are separated by space.

The data in files is inconsistent. All the Key and values are not always present.But the keys will always be as Table, count and size.

Below example has table_name, count, size information

cat sample1.txt
Table SCOTT.TABLE1 count 3889 size 300
Table SCOTT.TABLE2 count 7744
Table SCOTT.TABLE3 count 2622
Table SCOTT.TABLE4 size 2773 count 22
Table SCOTT.TABLE5 size 21

Below file have just table_name but no count and size data.

cat sample2.txt
Table SCOTT.TABLE1
Table SCOTT.TABLE2
Table SCOTT.TABLE3
Table SCOTT.TABLE4
Table SCOTT.TABLE5

So I am trying to convert these files into CSV format using following

cat <file_name> | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'

cat sample1.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  }
NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }
{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'

Table,Count,Size
SCOTT.TABLE1,3889,300
,,
,,
,,

And for the second sample

cat sample2.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,,
,,
,,
,,

But exepected as following:

For sample1.txt

TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21

For sample2.txt

Table,Count,Size
SCOTT.TABLE1,,
SCOTT.TABLE2,,
SCOTT.TABLE3,,
SCOTT.TABLE4,,
SCOTT.TABLE5,,

Thanks in advance.

Upvotes: 0

Views: 721

Answers (2)

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk -v OFS=',' '{for(i=1;i<NF;i+=2) 
                     {if(!($i in c)){c[$i];cols[++k]=$i};
                      v[NR,$i]=$(i+1)}} 
               END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS; 
                   print ""; 
                   for(i=1;i<=NR;i++) 
                     {for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
                      print ""}}' file

Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,

if you have gawk you can simplify it more with sorted-in

UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. This simplifies the problem and the following script should do the trick.

$ awk -v header='Table,count,size' \
      'BEGIN{OFS=","; n=split(header,h,OFS); print header} 
            {for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)} 
         END{for(i=1; i<=NR; i++) 
               {printf "%s", v[i,h[1]]; 
                for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]]; 
                print ""}}' file

Upvotes: 1

webb
webb

Reputation: 4340

here is an inelegant but fast and comprehensible solution:

awk 'BEGIN{OFS=",";print "TABLE,count,size"}
  {
    t=$2
    if($3=="count"){
      c=$4
      s=$6
    }
    else{
      s=$4
      c=$6
    }
    print t,c,s
  }' 1.txt

output:

TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622,
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21

Upvotes: 1

Related Questions