user2162153
user2162153

Reputation: 267

How to parse only one column with a different delimiter?

I have a tab-delimited file. The last column is semicolon-delimited with unequal, long lines. I want to parse this column.

Input:

AA   762    8640    BB    CC     DD      EE=T;FF=C;GG=G;HHA
II   852    6547    JJ    KK     LL      MM=G;NN=P;QQ=RF

Desired output:

AA   762    8640    BB    CC     DD      EE=T    FF=C    GG=G   HHA
II   852    6547    JJ    KK     LL      MM=G    NN=P    QQ=RF

I could get e.g. the first three values using this code:

awk 'BEGIN { FS=";" } { print $1, $2, $3}' file

But, when I run this, it does not parse the column, and just prints the files as is:

awk 'BEGIN { FS=";" } { print $0}' file

How can I fix this?

Upvotes: 2

Views: 115

Answers (3)

Mark Setchell
Mark Setchell

Reputation: 207405

Use tr to replace semi-colons with tabs like this:

tr ";" "\t" <yourfile 

Upvotes: 2

iruvar
iruvar

Reputation: 23374

Another approach

awk -F'[[:space:];]' -vOFS='\t' '{$1=$1;print}' input.txt
AA  762 8640    BB  CC  DD  EE=T    FF=C    GG=G    HHA
II  852 6547    JJ  KK  LL  MM=G    NN=P    QQ=RF

Upvotes: 3

jaypal singh
jaypal singh

Reputation: 77085

You can try something like:

awk 'BEGIN{FS=OFS="\t"}{gsub(/;/,"\t",$NF)}1' file

$ cat file
AA  762 8640    BB  CC  DD  EE=T;FF=C;GG=G;HHA
II  852 6547    JJ  KK  LL  MM=G;NN=P;QQ=RF

$ awk 'BEGIN{FS=OFS="\t"}{gsub(/;/,"\t",$NF)}1' file
AA  762 8640    BB  CC  DD  EE=T    FF=C    GG=G    HHA
II  852 6547    JJ  KK  LL  MM=G    NN=P    QQ=RF

Upvotes: 3

Related Questions