DanS
DanS

Reputation: 47

Is there a way to use awk to REMOVE lines based on threshold value?

I have a bunch of identifiers in the first column and scores for individual samples (for those identifiers) in the next columns, like this;

ID       1         2          3
21       20        70         80
13       44        50         10

I know the awk syntax to count how many instances there when every value in a row is less than 20 (($2 < 20) && ($3 < 20) && ($4 < 20)), but I don't know how to filter them out.

If I do (($2 > 20) && ($3 > 20) && ($4 > 20)) and print those and save them, it is not the same, because you will have instances in the first example where one value is less than 20 and the row is still kept because not ALL values are less than 20 (e.g. 10 40 45) . With the > version, all values must be greater than 20, so this row would have been deleted.

Can you please help me? Maybe I need sed? Thanks!

Upvotes: 1

Views: 397

Answers (3)

karakfa
karakfa

Reputation: 67507

It's not very clear what you're asking without the provided desired output. Also, your input file seems to have a header increasing confusion.

This is the alternatives you can use, comment indicates what records will be printed. You can extend to additional columns.

   awk -v t=20 '$2<t && $3<t' file         # all strictly less
   awk -v t=20 '!($2<t && $3<t)' file      # any greater or equal 
   awk -v t=20 '$2<t || $3<t' file         # any strictly less 
   awk -v t=20 '!($2<t || $3<t)' file      # all greater or equal

perhaps will help you to understand, these basic equalities

  !(p && q) == !p || !q    # for logical p,q
  !(p || q) == !p && !q
     !(x<y) == x>=y        # for numerical x,y

Upvotes: 2

Bertrand Martel
Bertrand Martel

Reputation: 45432

You can check if one of the value doesn't satisfy your condition iterating to NF and print the whole line according to this :

awk '{ 
        if (NR != 1){
            remove = 0;
            for (i = 1; i <= NF; i++) {
                if ($i < 20) {
                    remove = 1;
                    break;
                }
            }
            if (remove == 0){
                print $0
            }
        }
    }' test.txt

Upvotes: 3

George Vasiliou
George Vasiliou

Reputation: 6345

You are most probably doing something wrong.The statement "you will have instances in the first example where one value is less than 20 and the row is still kept because not ALL values are less than 20 (e.g. 10 40 45)" is not valid. Using && you ask for a logical AND and chained AND will result to output if all conditions AND returns true; meaning that the row is not kept:

$ echo "10        40         45" |awk '(($1<20) && ($2<20) && ($3<20))'
#Output : no output

If you want to keep above row then you need OR:

$ echo "10        40         45" |awk '(($1<20) || ($2<20) || ($3<20))'
#Output:
10        40         45

Similarly :

$ echo "10        40         45" |awk '(($1>20) && ($2>20) && ($3>20))'
# Output: No Output
$ echo "10        40         45" |awk '(($1>20) || ($2>20) || ($3>20))'
#Output:
10        40         45

Upvotes: 1

Related Questions