Manny
Manny

Reputation: 37

Separating delimited file with awk

I have a file named test.csv with the following list:

3732476,83440122!87368863!83440122!84947700!85873717!84671134!26592084
3732477,84213509!86563958!87368863!83440122!84947700!85873717!84671134!26592084
3732478,86774401!87852418!86774401!86774401!86774425!84022200!83908575!70993638
3732479,86774401!86774425!86774401!86774401!86774425!84022200!83908575!70993638
3732480,86774401!86774425!87726670!86774401!86774425!84022200!83908575!70993638
3732481,86301748!86301748!86769792!86434407!27580741!86730857!27953536!83913569
3732482,27500429!83445205!27500429!27500702!83445205!88324688!27500958
3732483,27500429!83445205!27500429!27500702!83445205!88324688!27500958
3732484,87920354!89442350!87920354!89442290!27632964!27633136!86450151
3732485,84406968!86504550!86348235!87708562!86279816!84406968!88066692

The block is composed of an ID followed by a series of numbers. The delimiter between the ID and the numbers is a comma (,), and the series of numbers have a delimiter of an exclamation mark (!). The ammount of series varies.

I need to display: ID,Series,Number of series like so

For line 3732485,84406968!86504550!86348235!87708562!86279816!84406968!88066692 I'd get:

3732485,84406968,1
3732485,86504550,2
3732485,86348235,3
3732485,87708562,4
3732485,86279816,5
3732485,84406968,6
3732485,88066692,7

I think I can do this via AWK, but I'm not that agile. I can get the series number with NR. I can separate the fields with the field separator FS but I cannot get it to print the $1 field next to the series value followed by the series sequence number.

Anyone want to point me to the right direction?

THanks!

Upvotes: 1

Views: 152

Answers (1)

jaypal singh
jaypal singh

Reputation: 77105

Using awk:

awk -F[,\!] -v OFS="," '{for(i=2;i<=NF;i++) print $1,$i,(i-1)}' file
3732476,83440122,1
3732476,87368863,2
...
...
3732485,84406968,6
3732485,88066692,7
  • Set the input field separator to , and !. You need to escape the !.
  • Set the output field separator to ,
  • Iterate from second column to the last column and print first column, remaining column and count. You subtract 1 from the count as needed in your output.

Note: I have not printed the entire output but you get the picture.

Upvotes: 3

Related Questions