81235
81235

Reputation: 179

How to remove the columns which contains NA in linux

I would like to remove the column which contains any number of NA. I used this command

awk ' $0 !="NA" {print $0}' file

But it does not work. For example, the file is as following

1  2 3 NA  6  male
4  6 2 1   NA female
NA 2 2 NA  3  male
7  2 2 7   NA male

I want to the output file as

  2 3 male
  6 2 female
  2 2 male
  2 2 male

Upvotes: 0

Views: 512

Answers (1)

Barmar
Barmar

Reputation: 782106

You need to make two passes over the data. The first pass should save all the input in an array, find the column numbers that contain NA, and save that in another array. Then at the end you print all the saved data, but skip over the columns that are in the second array.

awk '{ lines[NR] = $0; for (i = 1; i <= NF; i++) if ($i == "NA") skip[i] = 1;}
     END { for (i = 1; i <= NR; i++) {
            nf = split(lines[i], fields);
            for (j = 1; j <= nf; j++) if (!(j in skip)) printf("%s ", fields[j]);
            printf("\n");
           } 
         }' inputfile > outputfile

Upvotes: 1

Related Questions