Convert key:value to CSV file

Question

I found the following bash script for converting a file with key:value information to CSV file:

awk -F ":" -v OFS="," '
BEGIN { print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value" }
function printline() {
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
{data[$1] = $2}
NF == 0 {printline(); delete data}
END {printline()}
' file.yaml

But after executed it, it only converts the first group of data (only the first 6 rows of data), like this

category,recommenderSubtype,resource,matchesPattern,resource,value
COST,CHANGE_MACHINE_TYPE,instance-1,f1-micro,instance-1,g1-small

My original file is like this (with 1000 rows and more):

category:COST
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:portal-1
matchesPattern:f1-micro
resource:portal-1
value:g1-small
category:PERFORMANCE
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:old-3
matchesPattern:n1-standard-4
resource:old-3
value:n1-highmem-2

Is there any command am I missing?

linux-fan · Accepted Answer

The problem with the original script are these lines:

NF == 0 {printline(); delete data}
END {printline()}

The first line means: Call printline() if the current line has no records. The second line means call printline() after all data has been processed.

The difficulty with the input data format is that it does not really give a good indicator when to output the next record. In the following, I have simply changed the script to output the data every six records. In case there can be duplicate keys, the criterion for output might be "all fields populated" or such which would need to be programmed slightly differently.

#!/bin/sh -e
awk -F ":" -v OFS="," '
BEGIN {
    records_in = 0
    print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value"
}
{
    data[$1] = $2
    records_in++
    if(records_in == 6) {
        records_in = 0;
        print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
    }
}
' file.yaml

Other commends

I have just removed the delete statement, because I am unsure what it does. The POSIX specification for awk only defines it for deleting single array elements. In case the whole array should be deleted, it recommends doing a loop over the elements. In case all fields are always present, however, it might as well be possible to eliminate it altogether.
Welcome to SO (I am new here as well). Next time you are asking, I would recommend tagging the question awk rather than bash because AWK is really the scripting language used in this question with bash only being responsible for calling awk with suitable parameters :)

Convert key:value to CSV file

Answers (1)

Related Questions