Reputation: 7255
The data in my file are stored in columns. These columns contain specific values separated by ':'. I want to update the value if there is pipe '|' in either 6th or 8th column.
A line of my data is as follows:
0/1:38,59:97:99:.:.:2015,0,1366:0|1:.,.,.,.,.,.,.,.,.,.,.,.,.,.:1311:|:0.5 0/1:89,56:145:99:0|1:5238_G_C:2074,0,5187:.:.:.:.:. 0/1:31,65:96:99:.:.:2208,0,1170:.:.:.:.:. 0/1:58,74:132:99:.:.:2457,0,1761:.:.:.:.:.
In the input file there is a pipe in the first column at 8th field. So, the value of the 1st field gets updated by the value in the 8th field. There will be a pipe either at 6th or 8th field not at both. Pipe at other field positions doesn't matter. For the above input the expected output is:
0|1:38,59:97:99:.:.:2015,0,1366:0|1:.,.,.,.,.,.,.,.,.,.,.,.,.,.:1311:|:0.5 0/1:89,56:145:99:0|1:5238_G_C:2074,0,5187:.:.:.:.:. 0/1:31,65:96:99:.:.:2208,0,1170:.:.:.:.:. 0/1:58,74:132:99:.:.:2457,0,1761:.:.:.:.:.
And, I have the following code:
First I update the value from 6th field
awk 'BEGIN{FS=OFS="\t"} {for (i=1;i<=NF;i++) { split($i,f,/:/); if (f[6]~/\|/) sub(/^[^:]+/,f[6],$i) } }1' 2ms01e_only.pHASER01.vcf > 2ms01e_PG_6th.pHASER01.vcf
and then update value from 8th position
awk 'BEGIN{FS=OFS="\t"} {for (i=1;i<=NF;i++) { split($i,f,/:/); if (f[8]~/\|/) sub(/^[^:]+/,f[8],$i) } }1' 2ms01e_PG_6th.pHASER01.vcf > 2ms01e_PG_transfered.pHASER01.vcf
How can I write above code (two separate lines) as one line? So, I can update the value in the first field if there is a pipe either in 6th or 8th field. btw, The updated value is pulled from 6th or 8th field. The given code does what I want to do but I have to do it two time. I just want a code that is one line.
Thanks,
Upvotes: 0
Views: 52
Reputation: 37394
V2:
If you just want those 2 codes combined, this is one way:
awk 'BEGIN { FS=OFS="\t" }
{
for (i=1;i<=NF;i++) {
split($i,f,/:/);
if (f[6]~/\|/)
sub(/^[^:]+/,f[6],$i);
if (f[8]~/\|/)
sub(/^[^:]+/,f[8],$i);
}
} 1' 2ms01e_only.pHASER01.vcf > 2ms01e_PG_6th.pHASER01.vcf
Untested.
Upvotes: 1