Reputation: 420
I have a file with data as follows
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
1,2,3,4,5,6,7,8,9,10
1,2,1,2,0,1,0,1,0,1
1,1,1,1,0,2,3,0,0,0
5,1,1,0,0,0,0,0,1,0
I would like to change the delimiters from col6 through column 10 to pipe '|' and the column value would be followed by column name.
Desired Output:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
I tried using the command
awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i; next}
{for (i=1; i<=NF; i++) if ($i>0) printf "%s%s", ((i>5)?hdr[i] ":":"") $i,
((i<NF)? ((i>5)?"|":",") : ORS)}' data.csv
but not getting the result as expected
Output:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,col6:1|col8:1|col10:1
1,1,1,1,col6:2|col7:3|5,1,1,col9:1|
The columns that does not contain a zero is ending with '|' and the next line is starting data starts there ! In this example, row 2 data ends with a pipe '|' and row 3 data starts in the row 2. row 4 data ends with a pipe '|'
Can some one help me fix this please
P.S: For people looking for the reason behind all this work, I'm trying to load the data from a csv file in to a framework. Source data has 10 columns and The destination dataset would have 6 columns - first 5 from source as is and the rest as a map. Also, I'll have to make sure that there is no map key with the value as zero and then start the data analysis on the set. This post is to get help for making the data set ready for analysis.
Upvotes: 2
Views: 75
Reputation: 579
The above solution is excellent. Helps me with a similar issue. However, I need to cater for an all-zero case in columns 6 to 10. See the last line of your data below.
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
1,2,3,4,5,6,7,8,9,10
1,2,1,2,0,1,0,1,0,1
1,1,1,1,0,2,3,0,0,0
5,1,1,0,0,0,0,0,1,0
5,1,1,0,0,0,0,0,0,0
This might never happen in your data, however if it does you are left with an inconvenient comma at the end of the line:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
5,1,1,0,0,
To get around it I made change. Here it is, somewhat spread out for clarity:
awk -F ', *' '
NR==1{
for (i=1; i<=NF; i++) hdr[i]=$i":"
next
}
{
for (i=1; i<5; i++) printf("%s,", $i);
if(i==5) printf("%s", $i);
b="";
for (i=6; i<=NF; i++) {
if ($i>0) {
if(b=="") b=","; else b="|";
printf("%s%s",b, hdr[i] $i);
}
}
printf(ORS);
}
Upvotes: 1
Reputation: 113994
$ awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next} {for (i=1; i<=5; i++) printf $i","; b=""; for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";} printf ORS}' data.csv
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
Or, written over multiple lines:
awk -F ', *' '
NR==1{
for (i=1; i<=NF; i++) hdr[i]=$i":"
next
}
{
for (i=1; i<=5; i++) printf $i","
b=""
for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}
printf ORS
}
' data.csv
NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next}
For the first line, NR==1
, we save each field and a trailing colon into array hdr
. Then, the rest of the commands are skipped and we just to the next
line.
for (i=1; i<=5; i++) printf $i","
If we get here, we are working or the second are later lines. In this case, we print the first five fields, each followed by a comma.
b=""
We initialize the variable b
to the empty string.
for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}
For fields 6 to the end, if the field is nonzero, we print b
followed by the hdr
followed by the field value. After we have encountered the first such nonzero field, b
is set to |
.
printf ORS
After printing the last field, we print an output record separator (default is a newline).
Upvotes: 3