Reputation: 267
I have a tab-delimited file. The last column is semicolon-delimited with unequal, long lines. I want to parse this column.
Input:
AA 762 8640 BB CC DD EE=T;FF=C;GG=G;HHA
II 852 6547 JJ KK LL MM=G;NN=P;QQ=RF
Desired output:
AA 762 8640 BB CC DD EE=T FF=C GG=G HHA
II 852 6547 JJ KK LL MM=G NN=P QQ=RF
I could get e.g. the first three values using this code:
awk 'BEGIN { FS=";" } { print $1, $2, $3}' file
But, when I run this, it does not parse the column, and just prints the files as is:
awk 'BEGIN { FS=";" } { print $0}' file
How can I fix this?
Upvotes: 2
Views: 115
Reputation: 207405
Use tr
to replace semi-colons with tabs like this:
tr ";" "\t" <yourfile
Upvotes: 2
Reputation: 23374
Another awk approach
awk -F'[[:space:];]' -vOFS='\t' '{$1=$1;print}' input.txt
AA 762 8640 BB CC DD EE=T FF=C GG=G HHA
II 852 6547 JJ KK LL MM=G NN=P QQ=RF
Upvotes: 3
Reputation: 77085
You can try something like:
awk 'BEGIN{FS=OFS="\t"}{gsub(/;/,"\t",$NF)}1' file
$ cat file
AA 762 8640 BB CC DD EE=T;FF=C;GG=G;HHA
II 852 6547 JJ KK LL MM=G;NN=P;QQ=RF
$ awk 'BEGIN{FS=OFS="\t"}{gsub(/;/,"\t",$NF)}1' file
AA 762 8640 BB CC DD EE=T FF=C GG=G HHA
II 852 6547 JJ KK LL MM=G NN=P QQ=RF
Upvotes: 3