Reputation: 786
I have to transform (preprocess) a CSV file, by generating / inserting a new column, being the result of the concat of existing columns.
For example, transform:
A|B|C|D|E
into:
A|B|C|D|C > D|E
In this example, I do it with:
cat myfile.csv | awk 'BEGIN{FS=OFS="|"} {$4 = $4 OFS $3" > "$4} 1'
But now I have something more complex to do, and dont find how to do this.
I have to transform:
A|B|C|x,y,z|E
into
A|B|C|x,y,z|C > x,C > y,C > z|E
How can it be done in awk (or other command) efficiently (my csv file can contains thousands of lines)?
Thanks.
Upvotes: 3
Views: 113
Reputation: 5252
You can split the 4th field into an array:
awk 'BEGIN{FS=OFS="|"} {split($4,a,",");$4="";for(i=1;i in a;i++)$4=($4? $4 "," : "") $3 " > " a[i]} 1' myfile.csv
A|B|C|C > x,C > y,C > z|E
Upvotes: 3
Reputation: 26481
There are many ways to do this, but the simplest is the following:
$ awk 'BEGIN{FS=OFS="|"}{t=$4;gsub(/[^,]+/,$3" > &",t);$4 = $4 OFS t}1'
we make a copy of the fourth field in variable t
. In there, we replace every string which does not contain the new separator (,
) by the content of the third field followed by >
and the original matched string (&
).
Upvotes: 2
Reputation: 29167
With GNU awk
(for gensub
which is a GNU extension):
awk -F'|' '{$6=$5; $5=gensub(/(^|,)/,"\\1" $3 " > ","g",$4); print}' OFS='|'
Upvotes: 3