How to remove the columns which contains NA in linux

Question

I would like to remove the column which contains any number of NA. I used this command

awk ' $0 !="NA" {print $0}' file

But it does not work. For example, the file is as following

1  2 3 NA  6  male
4  6 2 1   NA female
NA 2 2 NA  3  male
7  2 2 7   NA male

I want to the output file as

  2 3 male
  6 2 female
  2 2 male
  2 2 male

Barmar · Accepted Answer

You need to make two passes over the data. The first pass should save all the input in an array, find the column numbers that contain NA, and save that in another array. Then at the end you print all the saved data, but skip over the columns that are in the second array.

awk '{ lines[NR] = $0; for (i = 1; i <= NF; i++) if ($i == "NA") skip[i] = 1;}
     END { for (i = 1; i <= NR; i++) {
            nf = split(lines[i], fields);
            for (j = 1; j <= nf; j++) if (!(j in skip)) printf("%s ", fields[j]);
            printf("
");
           } 
         }' inputfile > outputfile

How to remove the columns which contains NA in linux

Answers (1)

Related Questions