how to collapse fields with zeros from a datafile

Question

I have a file with data as follows

col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
1,2,3,4,5,6,7,8,9,10
1,2,1,2,0,1,0,1,0,1
1,1,1,1,0,2,3,0,0,0
5,1,1,0,0,0,0,0,1,0

I would like to change the delimiters from col6 through column 10 to pipe '|' and the column value would be followed by column name.

Desired Output:

1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1

I tried using the command

awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i; next}
   {for (i=1; i<=NF; i++) if ($i>0) printf "%s%s", ((i>5)?hdr[i] ":":"") $i,
            ((i5)?"|":",") : ORS)}' data.csv

but not getting the result as expected

Output:

1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,col6:1|col8:1|col10:1
1,1,1,1,col6:2|col7:3|5,1,1,col9:1|

The columns that does not contain a zero is ending with '|' and the next line is starting data starts there ! In this example, row 2 data ends with a pipe '|' and row 3 data starts in the row 2. row 4 data ends with a pipe '|'

Can some one help me fix this please

P.S: For people looking for the reason behind all this work, I'm trying to load the data from a csv file in to a framework. Source data has 10 columns and The destination dataset would have 6 columns - first 5 from source as is and the rest as a map. Also, I'll have to make sure that there is no map key with the value as zero and then start the data analysis on the set. This post is to get help for making the data set ready for analysis.

John1024 · Accepted Answer

$ awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next} {for (i=1; i<=5; i++) printf $i","; b=""; for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";} printf ORS}' data.csv
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1

Or, written over multiple lines:

awk -F ', *' '
NR==1{
    for (i=1; i<=NF; i++) hdr[i]=$i":"
    next
}

{
    for (i=1; i<=5; i++) printf $i","
    b=""
    for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}
    printf ORS
}
' data.csv

How it works

NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next}

For the first line, NR==1, we save each field and a trailing colon into array hdr. Then, the rest of the commands are skipped and we just to the next line.
for (i=1; i<=5; i++) printf $i","

If we get here, we are working or the second are later lines. In this case, we print the first five fields, each followed by a comma.
b=""

We initialize the variable b to the empty string.
for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}

For fields 6 to the end, if the field is nonzero, we print b followed by the hdr followed by the field value. After we have encountered the first such nonzero field, b is set to |.
printf ORS

After printing the last field, we print an output record separator (default is a newline).

how to collapse fields with zeros from a datafile

Answers (2)

How it works

Related Questions