How to transform a csv file having multiple delimiters using awk

Question

Below is a sample data. Please note this operation is required to be done on files with millions of records hence I need the optimal method. Essentially we are looking to update 2nd column with concatenation of first two characters from 4th column and excluding first 3 fields ('_' delimited) of 2nd column.

I have been trying using cut and reading the file line by line which is very time consuming. I need something with awk something like

awk -F, '{print $1","substr($4,1,2)"_"cut -f4-6 -d'_'($2)","$3","$4","$5","$6}'

Input Data:

234234234,123_33_3_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,123_11_2_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,123_33_3_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,123_33_3_11111_qewf_mkhsdf,01,09_68645,43234532,2

Output is required as:

234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2

Jose Ricardo Bustos M. · Accepted Answer

You can use awk and printf for line re-formating

awk -F"[,_]" '{
    printf "%s,%s_%s_%s_%s,%s,%s_%s,%s,%s
", $1,$9,$5,$6,$7,$8,$9,$10,$11,$12
}' file

you get,

234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2

How to transform a csv file having multiple delimiters using awk

Answers (1)

Related Questions