Alexxx
Alexxx

Reputation: 786

Awk multiple transformation / separators in once

I have to transform (preprocess) a CSV file, by generating / inserting a new column, being the result of the concat of existing columns.

For example, transform:

A|B|C|D|E

into:

A|B|C|D|C > D|E

In this example, I do it with:

cat myfile.csv | awk 'BEGIN{FS=OFS="|"} {$4 = $4 OFS $3" > "$4} 1'

But now I have something more complex to do, and dont find how to do this.

I have to transform:

A|B|C|x,y,z|E

into

A|B|C|x,y,z|C > x,C > y,C > z|E

How can it be done in awk (or other command) efficiently (my csv file can contains thousands of lines)?

Thanks.

Upvotes: 3

Views: 113

Answers (3)

Tyl
Tyl

Reputation: 5252

You can split the 4th field into an array:

awk 'BEGIN{FS=OFS="|"} {split($4,a,",");$4="";for(i=1;i in a;i++)$4=($4? $4 "," : "") $3 " > " a[i]} 1' myfile.csv
A|B|C|C > x,C > y,C > z|E

Upvotes: 3

kvantour
kvantour

Reputation: 26481

There are many ways to do this, but the simplest is the following:

$ awk 'BEGIN{FS=OFS="|"}{t=$4;gsub(/[^,]+/,$3" > &",t);$4 = $4 OFS t}1'

we make a copy of the fourth field in variable t. In there, we replace every string which does not contain the new separator (,) by the content of the third field followed by > and the original matched string (&).

Upvotes: 2

Renaud Pacalet
Renaud Pacalet

Reputation: 29167

With GNU awk (for gensub which is a GNU extension):

awk -F'|' '{$6=$5; $5=gensub(/(^|,)/,"\\1" $3 " > ","g",$4); print}' OFS='|'

Upvotes: 3

Related Questions