Reputation: 477
I have a file (file.txt) that looks like this with 6 columns:
CHROM ID REF ALT ALT_FREQS OBS_CT
1 rs376342519:10616:CC:C CC C 0.995708 3862
1 1:54712:T:TTTTC T TTTTC 0.595903 3862
1 rs368808541:603010:C:A C A 0.00194132 3862
1 1:625248:CA:C CA C 0.00326171 3862
1 1:746828:AGG:A AGGATTGCTGAC A 0.00364759 3862
1 1:774926:GAC:G GAC G 0.000900845 3862
1 1:775682:G:A G A 0.00183606 3862
1 rs149821290:781367:A:C A C 0.00493168 3862
1 rs139691522:800053:A:G A G 0.69276305 3862
If the 5th column is above 0.5 above I need to do 1 - (the value in column 5). E.g. Output would look like this with the new value in a new column called new_MAF. If the value is 0.5 or less it can just be printed as it is.
CHROM ID REF ALT ALT_FREQS OBS_CT new_MAF
1 rs376342519:10616:CC:C CC C 0.995708 3862 0.004292
1 1:54712:T:TTTTC T TTTTC 0.595903 3862 0.404097
1 rs368808541:603010:C:A C A 0.00194132 3862 0.00194132
1 1:625248:CA:C CA C 0.00326171 3862 0.00326171
1 1:746828:AGG:A AGGATTGCTGAC A 0.00364759 3862 0.00364759
1 1:774926:GAC:G GAC G 0.000900845 3862 0.000900845
1 1:775682:G:A G A 0.00183606 3862 0.00183606
1 rs149821290:781367:A:C A C 0.00493168 3862 0.00493168
1 rs139691522:800053:A:G A G 0.69276305 3862 0.30723695
I have used the following awk script below:
awk '{
if ($5 > 0.5)
print 1 - $5
else
print $5
}' file.txt > newfile.txt
It provides the right output
0.004292
0.404097
0.00194132
0.00326171
0.00364759
0.000900845
0.00183606
0.00493168
0.30723695
But I need it in the same file1
in a new column called new_MAF
at the end.
Upvotes: 1
Views: 62
Reputation: 14948
you can also assign the seventh column ($(NF+1)
) with a value:
awk '{ $(NF+1)=($5>0.5 ? 1-$5 : $5) }1' file1
(and use 1
, in the end will print $0
)
Upvotes: 2
Reputation: 785561
You may use:
awk '{print $0, (NR==1 ? "new_MAF" : ($5 > 0.5 ? 1-$5 : $5))}' file > _tmp && mv _tmp file
cat file | column -t
CHROM ID REF ALT ALT_FREQS OBS_CT new_MAF
1 rs376342519:10616:CC:C CC C 0.995708 3862 0.004292
1 1:54712:T:TTTTC T TTTTC 0.595903 3862 0.404097
1 rs368808541:603010:C:A C A 0.00194132 3862 0.00194132
1 1:625248:CA:C CA C 0.00326171 3862 0.00326171
1 1:746828:AGG:A AGGATTGCTGAC A 0.00364759 3862 0.00364759
1 1:774926:GAC:G GAC G 0.000900845 3862 0.000900845
1 1:775682:G:A G A 0.00183606 3862 0.00183606
1 rs149821290:781367:A:C A C 0.00493168 3862 0.00493168
1 rs139691522:800053:A:G A G 0.69276305 3862 0.307237
I have used column -t
for tabular display only that you can remove.
Upvotes: 2