Reputation: 43

Max of all columns based on distinct first column

How do I change the code if I have more than two columns. Let's say the data is like this

ifile.dat
1   10  15
3   34  20
1   4   22
3   32  33
5   3   46
2   2   98
4   20  100
3   13  23
4   50  65
1   40  76
2   20  22

How do I achieve this?

ofile.dat
1   40  76
2   20  98
3   34  33
4   50  100
5   3   46

I mean the max of each column by comparing first column. Thanks.

Here is what I have tried(on a sample file with 13columns). But the highest value is not coming up this way.

cat input.txt | sort -k1,1 -k2,2nr -k3,3nr -k4,4nr -k5,5nr -k6,6nr -k7,7nr -k8,8nr -k9,9nr -k10,10nr -nrk11,11 -nrk12,12 -nrk13,13 | sort -k1,1 -u

Upvotes: 0

Answers (1)

JNevill

Reputation: 50034

Instead of relying on sort, you could switch over to something more robust like awk:

 awk 'BEGIN{PROCINFO["sorted_in"] = "@val_num_asc"} {for(i=2;i<=NF;++i) if (a[$1][i]<$i){a[$1][i]=$i}} END{n=asorti(a, asorted); for(col1 in asorted){print col1, a[col1][2], a[col1][3]}}' input.txt

That's a mouthful. It breaks down like:

Before processing the file set the PROCINFO setting of sorted_in to @val_num_asc since we will be sorting the contents of an array by its index which will be numeric: (BEGIN{PROCINFO["sorted_in"] = "@val_num_asc"})
Loop through each field in the record for(i=2;i<=NF;++i)
Test to see if the current value is greater than the stored value in the array which has as its index the first field's value. If it is greater, then store it in place of whatever is currently held in the array for that index at that fields position: (if (a[$1][i]<$i){a[$1][i]=$i})
When done processing all of the records, sort the array into new array asorted: (END{n=asorti(a, asorted);)
Iterate through the array and print each element (for(col1 in asorted){print col1, a[col1][2], a[col1][3]})

There may be a more elegant way to do this in awk, but this will do the trick.

Example:

:~$ cat test
1   10  15
3   34  20
1   4   22
3   32  33
5   3   46
2   2   98
4   20  100
3   13  23
4   50  65
1   40  76
2   20  22
:~$ awk 'BEGIN{PROCINFO["sorted_in"] = "@val_num_asc"} {for(i=2;i<=NF;++i) if (a[$1][i]<$i){a[$1][i]=$i}} END{n=asorti(a, asorted); for(col1 in asorted){print col1, a[col1][2], a[col1][3]}}' test
1 40 76
2 20 98
3 34 33
4 50 100
5 3 46

Upvotes: 1

Max of all columns based on distinct first column

Answers (1)

Related Questions