Reputation: 469

Select lines based on value in a column

I have a tab delimited table for which I want to print all lines where column 'x' is greater than 'Y'. I have attempted using the code below but am new to using awk so am unsure how to use it based on columns.

awk '$X >= Y {print} ' Table.txt | cat > Wanted_lines

Y are values from 1 to 100.

If the input were like below with column X were the second column.

The wanted output would be:

The first 2 lines of the file is:

1   OTU1    243622  208679  121420  265864  0   0   2   0   0   11  1   5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   839604  OTU1    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%
2   OTU2    84366   120817  15834   74737   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   295755  OTU2    -   Archaea 100%    Euryarchaeota   100%    Methanobacteria 100%    Methanobacteriales  100%    Methanobacteriaceae 100%    Methanobrevibacter  100%

Upvotes: 6

Answers (3)

Anthony Rutledge

Reputation: 7604

First

awk's default internal field separator (FS) will work on space or tab delimited files.

Secondly

awk '$x > FLOOR' Table.txt

Where $x is the target column, and FLOOR is the actual numeric floor (i.e. 5000, etc ...)

Example file: awktest

500  100
400  1100
1000 400
1200 500


awk '$1 > 1000' awktest

1200   500

awk '$1 >= 1000' awktest

1000   400 
1200   500

Thus, you should be able to use a relational expression to print the lines where x > y, in the form:

awk '$x > $y' awktest

Where $x is a numeric column as in $1, or other.

Where $y is a numeric column as in $2, or other.

Example:

awk '$1 > $2' awktest

or ...

awk '$2 > $1' awktest

awk numbers are floating point numbers, so you can compare decimals, too.

Upvotes: 5

ghoti

Reputation: 46886

So...

'$X >= Y {print}' is redundant, as the default action in awk is to print.
| cat > file is UUOC.
Your expected output shows lines where that value is 80 or above. This answer assumes the output is what you really want, despite the lack of code to handle it.
I don't see how your last input example relates to things. Is there particular output you'd like from that input?

Consider:

$ awk '$X >= Y' X=2 Y=80 input.txt
3    100
4    100
5    80
7    90
$ awk '$X >= Y' X=2 Y=90 input.txt
3    100
4    100
7    90

The notation above relies on the following statement from man awk:

Any file of the form var=value is treated as an assignment, not a filename, and is executed at the time it would have been opened if it were a filename.

This is functionally equivalent to:

$ awk -v X=2 -v Y=80 '$X >= Y' input.txt

Either of these notations for getting shell variables into your awk script will do just fine, I believe any version of awk you come across (bsdawk, gawk, mawk) should handle both equally well.

Within a shell script, you might see something like this:

#!/usr/bin/env bash

if [[ $# != 2 ]]; then
  printf 'Please supply column and floor values as parameters.\n'
  exit 1
elif [[ $1 =~ [^0-9] ]] || [[ $2 =~ [^0-9] ]]; then
  printf 'Invalid parameters.\n'
  exit 1
fi

awk '$X >= Y' X="$1" Y="$2" input.txt

Upvotes: 0

Juan Diego Godoy Robles

Reputation: 14975

Try:

awk -v num_col=$X -v limit=$Y '$num_col + 0 >= limit + 0' Table.txt > Wanted_lines

Example:

$ cat Table.txt
1    30
2    50
3    100
4    100
5    80
6    79
7    90


$ X=2
$ Y=80
$ awk -v num_col=$X -v limit=$Y '$num_col + 0 > limit + 0' Table.txt
3    100
4    100
5    80
7    90

Alternatively (hacky and NOT recomended) awk enclosure could be broken this way:

$  awk '$'"${X}"' + 0 >= '"${Y}"' + 0' Table.txt

This is what you need to get rid of % symbol in your actual file:

$ awk -v num_col=43 -v limit=80 '{sub(/%/,"",$num_col)}$num_col + 0 >= limit + 0 ' Table.txt

Upvotes: 0

Select lines based on value in a column

Answers (3)

Related Questions