Reputation: 561
I could not separate my file:
chr2 215672546 rs6435862 G T 54.00 LowDP;sb DP=10;TI=NM_000465;GI=BARD1;FC=Silent ... ...
I would like to print first seven fields and from 8th field print just DP=10
and GI=BARD1
. DP
in GI
info is always in 8th field. Fields are continue (...) so 8th field is not last.
I know how to extract 8th field :
awk '{print $8}' PLZ-10_S2.vcf | awk -F ";" '/DP/ {OFS="\t"} {print $1}'
of course how to extract first seven fields, but how to pipe it together? Between all fields is tab
.
Upvotes: 0
Views: 201
Reputation: 203522
If DP= and GI= are always in the same position within $8:
$ awk 'BEGIN{FS=OFS="\t"} {split($8,a,/;/); $8=a[1]";"a[3]} 1' file
chr2 215672546 rs6435862 G T 54.00 LowDP;sb DP=10;GI=BARD1 ... ...
If not:
$ awk 'BEGIN{FS=OFS="\t"} {split($8,a,/;/); $8=""; for (i=1;i in a;i++) $8 = $8 (a[i] ~ /^(DP|GI)=/ ? ($8?";":"") a[i] : "")} 1' file
chr2 215672546 rs6435862 G T 54.00 LowDP;sb DP=10;GI=BARD1 ... ...
Upvotes: 2
Reputation: 36262
One way is to split()
with semicolon the eight field and traverse all results to check which of them begin with DP
or GI
:
awk '
BEGIN { FS = OFS = "\t" }
{
split( $8, arr8, /;/ )
$8 = ""
for ( i = 1; i <= length(arr8); i++ ) {
if ( arr8[i] ~ /^(DP|GI)/ ) {
$8 = $8 arr8[i] ";"
}
}
$8 = substr( $8, 1, length($8) - 1 )
print $0
}
' infile
It yields:
chr2 215672546 rs6435862 G T 54.00 LowDP;sb DP=10;GI=BARD1 ... ...
Upvotes: 1