discipulus
discipulus

Reputation: 2715

Find "N" minimum and "N" maximum values with respect to a column in the file and print the specific rows

I have a tab delimited file such as

Jack    2   98  F
Jones   6   25  51.77
Mike    8   11  61.70
Gareth  1   85  F
Simon   4   76  4.79
Mark    11  12  38.83
Tony    7   82  F
Lewis   19  17  12.83
James   12  1   88.83

I want to find the N minimum values and N maximum values (more than 5) in th the last print the rows that has those values. I want to ignore the rows with E. For example, if I want minimum two values and maximum in above data, my output would be

Minimum case

Simon   4   76  4.79
Lewis   19  17  12.83

Maximum case

James   12  1   88.83
Mike    8   11  61.70

I can ignore the columns that does not have numeric value in fourth column using

awk -F "\t" '$4+0 != $4{next}1' inputfile.txt

I can also pipe this output and find one minimum value using

awk -F "\t" '$4+0 != $4{next}1' inputfile.txt |awk 'NR == 1 || $4 < min {line = $0; min = $4}END{print line}'

and similarly for maximum value, but how can I extend this to more than one values like 2 values in the toy example above and 10 cases for my real data.

Upvotes: 0

Views: 374

Answers (3)

FMc
FMc

Reputation: 42411

Here's a pipeline approach to the problem.

$ grep -v 'F$' inputfile.txt | sort -nk 4  | head -2
Simon   4   76  4.79
Lewis   19  17  12.83

$ grep -v 'F$' inputfile.txt | sort -rnk 4 | tail -2
Mike    8   11  61.70
James   12  1   88.83

Upvotes: 0

ysth
ysth

Reputation: 98388

You can get the minimum and maximum at once with a little redirection:

minmaxlines=2
( ( grep -v 'F$' inputfile.txt | sort -n -k4 | tee /dev/fd/4 | head -n $minmaxlines >&3 ) 4>&1 | tail -n $minmaxlines ) 3>&1

Upvotes: 1

Kent
Kent

Reputation: 195039

n could be a variable. in this case, I set n=3. not, this may have problem if there are lines with same value in last col.

kent$  awk -v n=3 '$NF+0==$NF{a[$NF]=$0}
        END{ asorti(a,k,"@ind_num_asc")
                print "min:"
                for(i=1;i<=n;i++) print a[k[i]]
                print "max:"
                for(i=length(a)-n+1;i<=length(a);i++)print a[k[i]]}' f
min:
Simon   4   76  4.79
Lewis   19  17  12.83
Mark    11  12  38.83
max:
Jones   6   25  51.77
Mike    8   11  61.70
James   12  1   88.83

Upvotes: 1

Related Questions