user1999357
user1999357

Reputation: 113

awk print row where column 1 matches and column three is highest value

I am looking to print columns 1 & 2 where column 1 matches and column 3 is the highest value. I am currently using awk and sort to get this type of output:

    EXCEPTION 91 3
    EXCEPTION 15 5
    TEST 5 1
    TEST 1 8

the end desired output I am looking for:

EXCEPTION 15
TEST 1

Here is a file example and the commands I am running to get the uniq counts. What I would really like is for sort to print the last record in the uniq sort

EXCEPTION 15

so I don't have to all the crazy uniq --count logic. IE: I want to know if column1 matches >= 3times and print the last recorded column two value for that match.

 cat /tmp/testing.txt |grep EXCEPTION
    EXCEPTION 15
    EXCEPTION 15
    EXCEPTION 15
    EXCEPTION 91
    EXCEPTION 91
    EXCEPTION 91
    EXCEPTION 91
    EXCEPTION 15
    EXCEPTION 15

    cat /tmp/testing.txt|awk '{print $1 " " $2}'|sed '/^$/d'| awk '$2 >= '1' '|sort |uniq --count|awk '{print $1" "$2" "$3}'|awk '$1 >= '3''|awk '{print $1" "$2" "$3}'|awk '{print $2" "$3" "$1}'
    EXCEPTION 15 5
    EXCEPTION 91 4

Upvotes: 0

Views: 1097

Answers (2)

Qualia
Qualia

Reputation: 729

You said you didn't want horrible uniq logic... but in case you change your mind, this task does fit quite neatly into sort/uniq/cut's purview (though this isn't as efficient as the awk solution).

From your testing file you can get your desired output with

sort -k1,2n < testing | 
    uniq -c |
    sort -k2,2 -k1rn,1 |
    cut -c8- |
    sort -u -k1,1

In order: sort by first column alphabetically (default) and then second column numerically - this puts identical lines in sequence.

Then count the occurences of each line, prepend 8 characters to each line containing the count along with whitespaces.

Sort by count descending, after sorting by the string (EXCEPTION, TEST) which is now the second field.

Remove the first 8 characters from each line (the counts).

Finally "sort" by the string and only output uniques. As the record you're interested in has been sorted to the top, this is the one it outputs. This can be thought of as "uniq by field."

(If you want to remove the trailing spaces from your input you can replace the cut command with sed 's/^ *[0-9]\+ *//')

Upvotes: 0

fedorqui
fedorqui

Reputation: 289955

Just keep track of the maximums for any given 1st field and store its corresponding 2nd field:

awk '{if ($3>max[$1])
         {max[$1]=$3; val[$1]=$2}
     } 
     END {for (i in val) print i, val[i]}' file

Test

$ awk '{if ($3>max[$1]) {max[$1]=$3; val[$1]=$2}} END {for (i in val) print i, val[i]}' file
EXCEPTION 15
TEST 1

Upvotes: 1

Related Questions