Reputation: 3
I found the following bash script for converting a file with key:value information to CSV file:
awk -F ":" -v OFS="," '
BEGIN { print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value" }
function printline() {
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
{data[$1] = $2}
NF == 0 {printline(); delete data}
END {printline()}
' file.yaml
But after executed it, it only converts the first group of data (only the first 6 rows of data), like this
category,recommenderSubtype,resource,matchesPattern,resource,value
COST,CHANGE_MACHINE_TYPE,instance-1,f1-micro,instance-1,g1-small
My original file is like this (with 1000 rows and more):
category:COST
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:portal-1
matchesPattern:f1-micro
resource:portal-1
value:g1-small
category:PERFORMANCE
recommenderSubtype:CHANGE_MACHINE_TYPE
resource:old-3
matchesPattern:n1-standard-4
resource:old-3
value:n1-highmem-2
Is there any command am I missing?
Upvotes: 0
Views: 265
Reputation: 373
The problem with the original script are these lines:
NF == 0 {printline(); delete data}
END {printline()}
The first line means: Call printline() if the current line has no records. The second line means call printline()
after all data has been processed.
The difficulty with the input data format is that it does not really give a good indicator when to output the next record. In the following, I have simply changed the script to output the data every six records. In case there can be duplicate keys, the criterion for output might be "all fields populated" or such which would need to be programmed slightly differently.
#!/bin/sh -e
awk -F ":" -v OFS="," '
BEGIN {
records_in = 0
print "category","recommenderSubtype", "resource", "matchesPattern", "resource", "value"
}
{
data[$1] = $2
records_in++
if(records_in == 6) {
records_in = 0;
print data["category"], data["recommenderSubtype"], data["resource"], data["matchesPattern"], data["resource"], data["value"]
}
}
' file.yaml
Other commends
delete
statement, because I am unsure what it does. The POSIX specification for awk
only defines it for deleting single array elements. In case the whole array should be deleted, it recommends doing a loop over the elements. In case all fields are always present, however, it might as well be possible to eliminate it altogether.awk
rather than bash
because AWK is really the scripting language used in this question with bash
only being responsible for calling awk
with suitable parameters :)Upvotes: 1