Tom
Tom

Reputation: 469

Select lines based on value in a column

I have a tab delimited table for which I want to print all lines where column 'x' is greater than 'Y'. I have attempted using the code below but am new to using awk so am unsure how to use it based on columns.

awk '$X >= Y {print} ' Table.txt | cat > Wanted_lines 

Y are values from 1 to 100.

If the input were like below with column X were the second column.

1    30
2    50
3    100
4    100
5    80
6    79
7    90

The wanted output would be:

3    100
4    100
5    80
7    90

The first 2 lines of the file is:

1   OTU1    243622  208679  121420  265864  0   0   2   0   0   11  1   5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   839604  OTU1    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%
2   OTU2    84366   120817  15834   74737   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   295755  OTU2    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%

Upvotes: 6

Views: 12522

Answers (3)

Anthony Rutledge
Anthony Rutledge

Reputation: 7564

First

awk's default internal field separator (FS) will work on space or tab delimited files.

Secondly

awk '$x > FLOOR' Table.txt

Where $x is the target column, and FLOOR is the actual numeric floor (i.e. 5000, etc ...)

Example file: awktest

500  100
400  1100
1000 400
1200 500


awk '$1 > 1000' awktest

1200   500

awk '$1 >= 1000' awktest

1000   400 
1200   500

Thus, you should be able to use a relational expression to print the lines where x > y, in the form:

awk '$x > $y' awktest

Where $x is a numeric column as in $1, or other.

Where $y is a numeric column as in $2, or other.

Example:

awk '$1 > $2' awktest

or ...

awk '$2 > $1' awktest

awk numbers are floating point numbers, so you can compare decimals, too.

Upvotes: 5

ghoti
ghoti

Reputation: 46826

So...

  • '$X >= Y {print}' is redundant, as the default action in awk is to print.
  • | cat > file is UUOC.
  • Your expected output shows lines where that value is 80 or above. This answer assumes the output is what you really want, despite the lack of code to handle it.
  • I don't see how your last input example relates to things. Is there particular output you'd like from that input?

Consider:

$ awk '$X >= Y' X=2 Y=80 input.txt
3    100
4    100
5    80
7    90
$ awk '$X >= Y' X=2 Y=90 input.txt
3    100
4    100
7    90

The notation above relies on the following statement from man awk:

Any file of the form var=value is treated as an assignment, not a filename, and is executed at the time it would have been opened if it were a filename.

This is functionally equivalent to:

$ awk -v X=2 -v Y=80 '$X >= Y' input.txt

Either of these notations for getting shell variables into your awk script will do just fine, I believe any version of awk you come across (bsdawk, gawk, mawk) should handle both equally well.

Within a shell script, you might see something like this:

#!/usr/bin/env bash

if [[ $# != 2 ]]; then
  printf 'Please supply column and floor values as parameters.\n'
  exit 1
elif [[ $1 =~ [^0-9] ]] || [[ $2 =~ [^0-9] ]]; then
  printf 'Invalid parameters.\n'
  exit 1
fi

awk '$X >= Y' X="$1" Y="$2" input.txt

Upvotes: 0

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14945

Try:

awk -v num_col=$X -v limit=$Y '$num_col + 0 >= limit + 0' Table.txt > Wanted_lines

Example:

$ cat Table.txt
1    30
2    50
3    100
4    100
5    80
6    79
7    90


$ X=2
$ Y=80
$ awk -v num_col=$X -v limit=$Y '$num_col + 0 > limit + 0' Table.txt
3    100
4    100
5    80
7    90

Alternatively (hacky and NOT recomended) awk enclosure could be broken this way:

$  awk '$'"${X}"' + 0 >= '"${Y}"' + 0' Table.txt

This is what you need to get rid of % symbol in your actual file:

$ awk -v num_col=43 -v limit=80 '{sub(/%/,"",$num_col)}$num_col + 0 >= limit + 0 ' Table.txt

Upvotes: 0

Related Questions