Reputation: 469
I have a tab delimited table for which I want to print all lines where column 'x' is greater than 'Y'. I have attempted using the code below but am new to using awk so am unsure how to use it based on columns.
awk '$X >= Y {print} ' Table.txt | cat > Wanted_lines
Y are values from 1 to 100.
If the input were like below with column X were the second column.
1 30
2 50
3 100
4 100
5 80
6 79
7 90
The wanted output would be:
3 100
4 100
5 80
7 90
The first 2 lines of the file is:
1 OTU1 243622 208679 121420 265864 0 0 2 0 0 11 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 839604 OTU1 - Archaea 100% Euryarchaeota 100% Methanobacteria 100% Methanobacteriales 100% Methanobacteriaceae 100% Methanobrevibacter 100%
2 OTU2 84366 120817 15834 74737 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 295755 OTU2 - Archaea 100% Euryarchaeota 100% Methanobacteria 100% Methanobacteriales 100% Methanobacteriaceae 100% Methanobrevibacter 100%
Upvotes: 6
Views: 12522
Reputation: 7564
First
awk's default internal field separator (FS) will work on space or tab delimited files.
Secondly
awk '$x > FLOOR' Table.txt
Where $x
is the target column, and FLOOR
is the actual numeric floor (i.e. 5000, etc ...)
Example file: awktest
500 100
400 1100
1000 400
1200 500
awk '$1 > 1000' awktest
1200 500
awk '$1 >= 1000' awktest
1000 400
1200 500
Thus, you should be able to use a relational expression to print the lines where x > y, in the form:
awk '$x > $y' awktest
Where $x
is a numeric column as in $1
, or other.
Where $y
is a numeric column as in $2
, or other.
Example:
awk '$1 > $2' awktest
or ...
awk '$2 > $1' awktest
awk numbers are floating point numbers, so you can compare decimals, too.
Upvotes: 5
Reputation: 46826
So...
'$X >= Y {print}'
is redundant, as the default action in awk is to print.| cat > file
is UUOC.Consider:
$ awk '$X >= Y' X=2 Y=80 input.txt
3 100
4 100
5 80
7 90
$ awk '$X >= Y' X=2 Y=90 input.txt
3 100
4 100
7 90
The notation above relies on the following statement from man awk
:
Any file of the form var=value is treated as an assignment, not a filename, and is executed at the time it would have been opened if it were a filename.
This is functionally equivalent to:
$ awk -v X=2 -v Y=80 '$X >= Y' input.txt
Either of these notations for getting shell variables into your awk script will do just fine, I believe any version of awk you come across (bsdawk, gawk, mawk) should handle both equally well.
Within a shell script, you might see something like this:
#!/usr/bin/env bash
if [[ $# != 2 ]]; then
printf 'Please supply column and floor values as parameters.\n'
exit 1
elif [[ $1 =~ [^0-9] ]] || [[ $2 =~ [^0-9] ]]; then
printf 'Invalid parameters.\n'
exit 1
fi
awk '$X >= Y' X="$1" Y="$2" input.txt
Upvotes: 0
Reputation: 14945
Try:
awk -v num_col=$X -v limit=$Y '$num_col + 0 >= limit + 0' Table.txt > Wanted_lines
Example:
$ cat Table.txt
1 30
2 50
3 100
4 100
5 80
6 79
7 90
$ X=2
$ Y=80
$ awk -v num_col=$X -v limit=$Y '$num_col + 0 > limit + 0' Table.txt
3 100
4 100
5 80
7 90
Alternatively (hacky and NOT recomended) awk enclosure could be broken this way:
$ awk '$'"${X}"' + 0 >= '"${Y}"' + 0' Table.txt
This is what you need to get rid of % symbol in your actual file:
$ awk -v num_col=43 -v limit=80 '{sub(/%/,"",$num_col)}$num_col + 0 >= limit + 0 ' Table.txt
Upvotes: 0