user1579465
user1579465

Reputation: 11

Sort an array by alphabetical order in awk

I've got a file which looks like this :

1    a
3    b
2    b
9    a
0    a
5    c
8    b

I'd like...

  1. to print only the last instance of each element present in column 2 with its corresponding value in column 1 ;
  2. to sort the result of 1. by alphabetical order, based on column 2 content ;
  3. to add a third column to the output before column 1, which content would depend on column 2 value ;
  4. to replace tabs by carriage return ;

... all this in a single awk program.

So the final output would be something like :

x
0
a
x
8
b
y
5
c

I succed doing all this, but using two awk programs and one external command :

awk -F '\t' '{
    value[$2]=$2"\t"$1 }
    END { for (i in value) print value[i]
    }' | \
sort -dfb | \
awk -F '\t' '{
if ($1 == "a" || $1=="b") print "x\n"$2"\n"$1
if ($1 == "c") print "y\n"$2"\n"$1
}'

A simpler way to do this would be to sort the arrays of the first awk program by alphabetical order. This would permit to merge the content of the second awk program in the first. However, I've no idea how I can do this. Any idea ?

Upvotes: 0

Views: 2142

Answers (2)

dan
dan

Reputation: 1

This is six years ago, and here I am replying... if I understand the request the list of values are:

1    a
3    b
2    b
9    a
0    a
5    c
8    b

Is to be processed for only 1 instance of column 2, with the lowest associated value of column 1. The desired result:

0    a
2    b
5    c

The process seemed to be simplest by using 2 sorts instead of awk. Capturing the list of values in FILE, the following commands would present the results:

$ sort +0 -1n FILE|sort +1 -2 -u
0    a
2    b
5    c

The reverse order or highest column 1 value for each unique column 2

$ sort +0 -1nr FILE|sort +1 -2 -u
9    a
8    b
5    c

If awk is preferred over the sort, then the following awk program can perform the action to take the smallest value for each unique column 2 entry:

$ awk '{if($2 in COL2){if(COL2[$2]>$1){COL2[$2]=$1}}else{COL2[$2]=$1}}END{for(I in COL2){print COL2[I],I}}' FILE
0 a
2 b
5 c

The reverse order, the highest value of column 1 for each unique column 2 entry is accomplished by replacing ">" with "<":

$ awk '{if($2 in COL2){if(COL2[$2]<$1){COL2[$2]=$1}}else{COL2[$2]=$1}}END{for(I in COL2){print COL2[I],I}}' FILE
9 a
8 b
5 c

Possibly I missed the requirements, and 6 years later is not a very timely response. I was looking for something else, and found this and couldn't help myself.

Upvotes: 0

Dimitre Radoulov
Dimitre Radoulov

Reputation: 28000

GNU awk <= 3:

WHINY_USERS= awk 'END {
  for (R in r)
    printf "%s\n%s\n%s\n", 
      (R ~ /^[ab]$/ ? "x" : "y" ), r[R], R
  }
{
  r[$2] = $1
  }' infile

GNU awk >= 4:

awk 'END {
  PROCINFO["sorted_in"] = "@ind_str_asc"
  for (R in r)
    printf "%s\n%s\n%s\n", 
      (R ~ /^[ab]$/ ? "x" : "y" ), r[R], R
  }
{
  r[$2] = $1
  }' infile

Upvotes: 1

Related Questions