Reputation: 2715
I have a tab delimited file such as
Jack 2 98 F
Jones 6 25 51.77
Mike 8 11 61.70
Gareth 1 85 F
Simon 4 76 4.79
Mark 11 12 38.83
Tony 7 82 F
Lewis 19 17 12.83
James 12 1 88.83
I want to find the N minimum values and N maximum values (more than 5) in th the last print the rows that has those values. I want to ignore the rows with E. For example, if I want minimum two values and maximum in above data, my output would be
Minimum case
Simon 4 76 4.79
Lewis 19 17 12.83
Maximum case
James 12 1 88.83
Mike 8 11 61.70
I can ignore the columns that does not have numeric value in fourth column using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt
I can also pipe this output and find one minimum value using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt |awk 'NR == 1 || $4 < min {line = $0; min = $4}END{print line}'
and similarly for maximum value, but how can I extend this to more than one values like 2 values in the toy example above and 10 cases for my real data.
Upvotes: 0
Views: 374
Reputation: 42411
Here's a pipeline approach to the problem.
$ grep -v 'F$' inputfile.txt | sort -nk 4 | head -2
Simon 4 76 4.79
Lewis 19 17 12.83
$ grep -v 'F$' inputfile.txt | sort -rnk 4 | tail -2
Mike 8 11 61.70
James 12 1 88.83
Upvotes: 0
Reputation: 98388
You can get the minimum and maximum at once with a little redirection:
minmaxlines=2
( ( grep -v 'F$' inputfile.txt | sort -n -k4 | tee /dev/fd/4 | head -n $minmaxlines >&3 ) 4>&1 | tail -n $minmaxlines ) 3>&1
Upvotes: 1
Reputation: 195039
n
could be a variable. in this case, I set n=3
. not, this may have problem if there are lines with same value in last col.
kent$ awk -v n=3 '$NF+0==$NF{a[$NF]=$0}
END{ asorti(a,k,"@ind_num_asc")
print "min:"
for(i=1;i<=n;i++) print a[k[i]]
print "max:"
for(i=length(a)-n+1;i<=length(a);i++)print a[k[i]]}' f
min:
Simon 4 76 4.79
Lewis 19 17 12.83
Mark 11 12 38.83
max:
Jones 6 25 51.77
Mike 8 11 61.70
James 12 1 88.83
Upvotes: 1