Reputation: 307
I am working on some data files where data is of key and value pairs that are separated by space.
The data in files is inconsistent. All the Key and values are not always present.But the keys will always be as Table, count and size.
Below example has table_name, count, size information
cat sample1.txt
Table SCOTT.TABLE1 count 3889 size 300
Table SCOTT.TABLE2 count 7744
Table SCOTT.TABLE3 count 2622
Table SCOTT.TABLE4 size 2773 count 22
Table SCOTT.TABLE5 size 21
Below file have just table_name but no count and size data.
cat sample2.txt
Table SCOTT.TABLE1
Table SCOTT.TABLE2
Table SCOTT.TABLE3
Table SCOTT.TABLE4
Table SCOTT.TABLE5
So I am trying to convert these files into CSV format using following
cat <file_name> | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
cat sample1.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," }
NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }
{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,3889,300
,,
,,
,,
And for the second sample
cat sample2.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS="," } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,,
,,
,,
,,
But exepected as following:
For sample1.txt
TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21
For sample2.txt
Table,Count,Size
SCOTT.TABLE1,,
SCOTT.TABLE2,,
SCOTT.TABLE3,,
SCOTT.TABLE4,,
SCOTT.TABLE5,,
Thanks in advance.
Upvotes: 0
Views: 721
Reputation: 67467
awk
to the rescue!
$ awk -v OFS=',' '{for(i=1;i<NF;i+=2)
{if(!($i in c)){c[$i];cols[++k]=$i};
v[NR,$i]=$(i+1)}}
END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS;
print "";
for(i=1;i<=NR;i++)
{for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
print ""}}' file
Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,
if you have gawk
you can simplify it more with sorted-in
UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. This simplifies the problem and the following script should do the trick.
$ awk -v header='Table,count,size' \
'BEGIN{OFS=","; n=split(header,h,OFS); print header}
{for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)}
END{for(i=1; i<=NR; i++)
{printf "%s", v[i,h[1]];
for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]];
print ""}}' file
Upvotes: 1
Reputation: 4340
here is an inelegant but fast and comprehensible solution:
awk 'BEGIN{OFS=",";print "TABLE,count,size"}
{
t=$2
if($3=="count"){
c=$4
s=$6
}
else{
s=$4
c=$6
}
print t,c,s
}' 1.txt
output:
TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622,
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21
Upvotes: 1