user2065960
user2065960

Reputation: 45

How to get the max value in every group of n rows using AWK?

Say I have the following 3 columns in text file:

1 003 3  
2 006 1  
3 005 4  
4 001 2  
5 006 7  
6 002 2  
7 004 3  
8 001 6  
9 002 8  
10 005 2

I want to output 3 columns:

Starting from after the first one. So from that input, the output would be:

1 003 3  
4 005 4  
7 006 7  
10 002 8

What I tried:

awk \
        'BEGIN{
                cnt=3;
                max=0;
        };

        {
                if (cnt == 3){
                        cnt++;
                        max_arr[cnt]=$3;
                        for (i in max_arr){
                                if (max_arr[i] > max)
                                        { max = max_arr[i] }
                        }

                        printf "%s %s %s\n", $1,$2,max;
                        cnt=1;
                        delete max_arr;
                        max=0;
                }
                else{
                        cnt++;
                        max_arr[cnt]=$3;
                }
}' input_file.txt

This gives me:

1 003 3  
4 001 4  
7 004 7  
10 005 8

Column 1 and 3 are correct, but 2 is wrong.

Upvotes: 1

Views: 837

Answers (4)

Ed Morton
Ed Morton

Reputation: 203169

This is how you do it robustly:

$ cat tst.awk
{
    isBlockBeg = ( (NR%3)==2 )
    isBlockEnd = ( (NR%3)==1 )
}
isBlockBeg { max=$3 }
$3 >= max  { max=$3; val=$2 }
isBlockEnd { print $1, val, max }
END { if (!isBlockEnd) print $1, val, max }

$ awk -f tst.awk file
1 003 3
4 005 4
7 006 7
10 002 8

Note that the above will work whether your data is numbers or strings, whether or not your data is all-negative, and even if your data doesn't end nicely at the end of a block of 3. If you don't need that last part, you can reduce it to just:

$ cat tst.awk
(NR%3)==2 { max=$3 }
$3 >= max { max=$3; val=$2 }
(NR%3)==1 { print $1, val, max }

$ awk -f tst.awk file
1 003 3
4 005 4
7 006 7
10 002 8

Upvotes: 2

Aserre
Aserre

Reputation: 5062

You could try the following awk script :

# file : script.awk

# if max[1] is uninitialized OR ...
# if the 3rd field of our current line is > than the one stored in our max array ...
# we store the 2nd and 3rd field of our line in the array
!(1 in max) || max[1]<$3 { max[0]=$2; max[1]=$3; }

# if the remainder of our line_number / 3 == 1 (lines 4, 7, 10, ...)
NR % 3 == 1 {
    # we print the line_number, and the 2 max values
    print NR,max[0],max[1]

    # we delete the old array
    delete max
}

You can then call it like this : awk -f script.awk data

Sample input :

> cat data
1 003 3
2 006 1
3 005 4
4 001 2
5 006 7
6 002 2
7 004 3
8 001 6
9 002 8
10 005 2

Sample output :

> awk -f script.awk data
1 003 3
4 005 4
7 006 7
10 002 8

Upvotes: 2

karakfa
karakfa

Reputation: 67467

if $3 values are all positive...

$ awk '$3>m3   {m3=$3; v2=$2} 
       NR%3==1 {print $1,v2,m3; m3=0}' file

1 003 3
4 005 4
7 006 7
10 002 8

Upvotes: 1

oliv
oliv

Reputation: 13239

A shorter awk script could be this one:

awk 'm<$3{m=$3;n=$2} !((NR+2)%3){print $1,n,m;m=n=""}' file

where the max value of column 3 is m, the corresponding value of column 2 is n.

The statement !((NR+2)%3) is executed for the first line and every next 3 lines, which print the wanted value and unset both the max value of column 3 m and n.

Upvotes: 2

Related Questions