Fabio Barteri
Fabio Barteri

Reputation: 9

How can I select lines in reason of a value in awk?

Let's assume I have a file which structure looks like this:

AAAA 700 something1 something_else1
AAAA 98 something2 something_else2
AAAA 2000 something3 something_else3
BBBB 200 something4 something_else4
BBBB 21 something5 something_else5
BBBB 300 something6 something_else6

I need to extract, for each value in column $1, the whole line having the highest value in column $1. This means that, for the field AAAA, I would need to print the line in which $2=2000. The output should thus look like:

AAAA 2000 something3 something_else3
BBBB 300 something6 something_else6

I did it with python, but the file is huge and the process is very time-consuming. Is there any way to do it with awk?

Upvotes: 0

Views: 59

Answers (3)

karakfa
karakfa

Reputation: 67467

a combination of sort/awk will be easiest

$ sort -k1,2nr file | awk '!a[$1]++'

AAAA 2000 something3 something_else3
BBBB 300 something6 something_else6

sort by first field and second field (descending), let awk pick the first rows of the groups (highest by design).

Upvotes: 1

Jose Ricardo Bustos M.
Jose Ricardo Bustos M.

Reputation: 8164

you can try

awk '
!($1 in max) || ($2>max[$1]) {
  max[$1]=$2; a[$1]=$0;
} 
END{ 
  for(i in a){ 
    print a[i];
  }
}' input_file

you get (the order may be different because it depends on hash in a):

BBBB 300 something6 something_else6
AAAA 2000 something3 something_else3

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

$ cat tst.awk
$1!=prev { if (rec!="") print rec; max=$2; rec=$0 }
$2 > max { max=$2; rec=$0 }
{ prev=$1 }
END { if (rec!="") print rec }

$ awk -f tst.awk file
AAAA 2000 something3 something_else3
BBBB 300 something6 something_else6

The above assumes the $1 values are always grouped together as shown in your sample input. Given that, it only stores 1 record in memory at a time (since you say your input file is huge that could be important), prints the records in the same order they were read, will work even for zero or negative $2 values, and will not output anything for an empty input file.

Upvotes: 3

Related Questions