Håkon Hægland
Håkon Hægland

Reputation: 40758

Sorting multiple arrays simultaneously in awk

Introduction

Consider the following example sort.awk:

BEGIN {
    a[1]="5"; 
    a[2]="3";
    a[3]="6";

    asort(a)
    for (i=1; i<=3; i++) print a[i]
}

Running with awk -f sort.awk prints the sorted numbers in array a in ascending order:

3
5
6

Question

Consider the extended case of two (and, in general, for N) corresponding arrays a and b

a[1]="5"; b[1]="fifth"
a[2]="3"; b[2]="third"
a[3]="6"; b[3]="sixth"

and the problem of sorting all arrays "simultaneously".. To achieve this, I need to sort array a but also to obtain the indices of the sorting. For this simple case, the indices would be given by

ind[1]=2; ind[2]=1; ind[3]=3;

Having these indices, I can then print out also the sorted b array based on the result of the sorting of array a. For instance:

for (i=1;i<=3;i++) print a[ind[i]], b[ind[i]]

will print the sorted arrays..

See also Sort associative array with AWK.

Upvotes: 1

Views: 576

Answers (2)

Kent
Kent

Reputation: 195079

I come up with two methods to do your "simultaneous" sort.

  • One is combining the two arrays then sort. This is useful when you just need the output.

  • the other one is using gawk's asorti()

read codes for details, I think it is easy to understand:

BEGIN{
    a[1]="5"; b[1]="fifth"
    a[2]="3"; b[2]="third"
    a[3]="6"; b[3]="sixth"

    #method 1: combine the two arrays before sort
    for(;++i<=3;)
        n[i] = a[i]" "b[i]
    asort(n)
    print "--- method 1: ---"
    for(i=0;++i<=3;)
        print n[i]

    #method 2:
    #here we build a new array/hastable, and use asorti()
    for(i=0;++i<=3;)
        x[a[i]]=b[i]

    asorti(x,t)
    print "--- method 2: ---"
    for(i=0;++i<=3;)
        print t[i],x[t[i]]
}

output:

kent$  awk -f sort.awk
--- method 1: ---
3 third
5 fifth
6 sixth
--- method 2: ---
3 third
5 fifth
6 sixth

EDIT

if you want to get the original index, you can try the method3 as following:

#method 3: 
print "--- method 3: ---"
for(i=0;++i<=3;)
    c[a[i]] = i;

asort(a)
for(i=0;++i<=3;)
    print a[i], " | related element in b: "b[c[a[i]]], " | original idx: " c[a[i]] 

the output is:

--- method 3: ---
3  | related element in b: third  | original idx: 2
5  | related element in b: fifth  | original idx: 1
6  | related element in b: sixth  | original idx: 3

you can see, the original idx is there. if you want to save them into an array, just add idx[i]=c[a[i]] in the for loop.

EDIT2

method 4: combine with different order, then split to get idx array:

#method 4:

for(i=0;++i<=3;)
    m[i] = a[i]"\x99"i 
asort(m)
print "--- method 4: ---"
for(i=0;++i<=3;){
    split(m[i],x,"\x99")
    ind[i]=x[2]
    }

#test ind array:
for(i=0;++i<=3;)
    print i"->"ind[i]

output:

--- method 4: ---
1->2
2->1
3->3

Upvotes: 2

H&#229;kon H&#230;gland
H&#229;kon H&#230;gland

Reputation: 40758

Based on Kents answer, here is a solution that should also obtain the indices:

BEGIN {
    a[1]="5";
    a[2]="3";
    a[3]="6";

    for (i=1; i<=3; i++) b[i]=a[i]" "i
    asort(b)
    for (i=1; i<=3; i++) {
      split(b[i],c," ")
      ind[i]=c[2]
    }
    for (i=1; i<=3; i++) print ind[i]
}

Upvotes: 0

Related Questions