HKJ3
HKJ3

Reputation: 477

How to use awk to calculate a new value and storing it in a new column?

I have a file (file.txt) that looks like this with 6 columns:

CHROM   ID  REF ALT ALT_FREQS   OBS_CT
1   rs376342519:10616:CC:C  CC  C   0.995708    3862
1   1:54712:T:TTTTC T   TTTTC   0.595903    3862
1   rs368808541:603010:C:A  C   A   0.00194132  3862
1   1:625248:CA:C   CA  C   0.00326171  3862
1   1:746828:AGG:A  AGGATTGCTGAC    A   0.00364759  3862
1   1:774926:GAC:G  GAC G   0.000900845 3862
1   1:775682:G:A    G   A   0.00183606  3862
1   rs149821290:781367:A:C  A   C   0.00493168  3862
1   rs139691522:800053:A:G  A   G   0.69276305  3862

If the 5th column is above 0.5 above I need to do 1 - (the value in column 5). E.g. Output would look like this with the new value in a new column called new_MAF. If the value is 0.5 or less it can just be printed as it is.

CHROM   ID  REF ALT ALT_FREQS   OBS_CT  new_MAF
1   rs376342519:10616:CC:C  CC  C   0.995708    3862  0.004292
1   1:54712:T:TTTTC T   TTTTC   0.595903    3862    0.404097
1   rs368808541:603010:C:A  C   A   0.00194132  3862   0.00194132
1   1:625248:CA:C   CA  C   0.00326171  3862   0.00326171
1   1:746828:AGG:A  AGGATTGCTGAC    A   0.00364759  3862  0.00364759
1   1:774926:GAC:G  GAC G   0.000900845 3862  0.000900845
1   1:775682:G:A    G   A   0.00183606  3862  0.00183606
1   rs149821290:781367:A:C  A   C   0.00493168  3862   0.00493168
1   rs139691522:800053:A:G  A   G   0.69276305  3862   0.30723695

I have used the following awk script below:

awk '{
if ($5 > 0.5)
   print 1 - $5
else 
   print $5
  }' file.txt > newfile.txt

It provides the right output

0.004292
0.404097
0.00194132
0.00326171
0.00364759
0.000900845
0.00183606
0.00493168
0.30723695

But I need it in the same file1 in a new column called new_MAF at the end.

Upvotes: 1

Views: 62

Answers (2)

Luuk
Luuk

Reputation: 14948

you can also assign the seventh column ($(NF+1)) with a value:

awk '{ $(NF+1)=($5>0.5 ? 1-$5 : $5) }1' file1

(and use 1, in the end will print $0 )

Upvotes: 2

anubhava
anubhava

Reputation: 785561

You may use:

awk '{print $0, (NR==1 ? "new_MAF" : ($5 > 0.5 ? 1-$5 : $5))}' file > _tmp && mv _tmp file

cat file | column -t

CHROM  ID                      REF           ALT    ALT_FREQS    OBS_CT  new_MAF
1      rs376342519:10616:CC:C  CC            C      0.995708     3862    0.004292
1      1:54712:T:TTTTC         T             TTTTC  0.595903     3862    0.404097
1      rs368808541:603010:C:A  C             A      0.00194132   3862    0.00194132
1      1:625248:CA:C           CA            C      0.00326171   3862    0.00326171
1      1:746828:AGG:A          AGGATTGCTGAC  A      0.00364759   3862    0.00364759
1      1:774926:GAC:G          GAC           G      0.000900845  3862    0.000900845
1      1:775682:G:A            G             A      0.00183606   3862    0.00183606
1      rs149821290:781367:A:C  A             C      0.00493168   3862    0.00493168
1      rs139691522:800053:A:G  A             G      0.69276305   3862    0.307237

I have used column -t for tabular display only that you can remove.

Upvotes: 2

Related Questions