Vaulstein
Vaulstein

Reputation: 22041

Split file using awk at pattern

Here is an example of the data that I have in a row in example.tsv:

somedata1:data1#||#somedata2:data2#||#somedata1:data3#||#somedata2:data4

I wanted to do two things:

  1. Split the data from the pattern '#||#' and write it to other file. The number of columns after splitting is not fixed. I have tried the awk command:

    awk -F"#\|\|#" '{print;}' example.tsv > splitted.tsv

    Output of the first file should be:

    column 1 somedata1:data1 somedata2:data2 somedata1:data3 somedata2:data4

  2. Next I want split the data in splitted.tsv based on the ':'.

    somedata1 data1 data3 And write it to a file. Is there a way we could do this in a single awk command?

Upvotes: 0

Views: 620

Answers (2)

terdon
terdon

Reputation: 3370

For the first split, you could try

$ awk 'BEGIN{print "column1"}{gsub(/#\|\|#/,"\n"); print }' file 
column1
somedata:data1
somedata:data2
somedata:data3
somedata:data1

To then split on :, you could do:

$ awk -F: 'BEGIN{print "column1","column2"}
                {gsub(/#\|\|#/,"\n"); gsub(/:/," ");print }' file
column1 column2
somedata data1
somedata data2
somedata data3
somedata data1

Upvotes: 0

Jotne
Jotne

Reputation: 41446

You need to escape the | correctly. Then use split

awk -F'#\\|\\|#' '{split($2,a,":");print a[2]}' file
data2

To print all data out in a table:

awk -F'#\\|\\|#' '{for (i=1;i<=NF;i++) print $i}' file
somedata:data1
somedata:data2
somedata:data3
somedata:data1

To split the data even more:

awk -F'#\\|\\|#' '{for (i=1;i<=NF;i++) {split($i,a,":");print a[1],a[2]}}' file
somedata data1
somedata data2
somedata data3
somedata data1

Upvotes: 4

Related Questions