zhihao_li
zhihao_li

Reputation: 183

bash: sorting based on numerical distance

I have a file containing a list of people with their gender and age like this:

name1    M    73.2
name2    M    31.5
name3    F    20.3
name4    F    55.0
...

Is there a bash one-liner to sort this list based on numerical distances to a given age, say 30.0, so that the result becomes:

name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

Upvotes: 0

Views: 149

Answers (4)

Shawn
Shawn

Reputation: 52344

Another approach, using perl instead of awk:

$ age=30 perl -anE 'push @lines, [@F, abs($ENV{age} - $F[2])];
   END { say join("\t", $_->@[0..2]) for sort { $a->[3] <=> $b->[3] } @lines }' input.txt 
name2   M   31.5
name3   F   20.3
name4   F   55.0
name1   M   73.2

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 753645

Any version of Awk

awk -v ref=30.0 '{ print $1, $2, $3, ($3 < ref) ? ref - $3 : $3 - ref }' |
sort -k4,4n |
awk '{ print $1, $2, $3 }'

Add the distance from the reference age as an extra column, sort on it, remove it. You could use cut for the removal operation if you prefer. If you use GNU Awk, you can do it all in awk. There are ways to preserve the spacing if that's important to you.

You can write it all on one line if you insist; that's your choice.

All-in-one using GNU Awk

Checking the GNU Awk manual shows that there isn't an abs() built-in function, which is a little surprising. GNU Awk does have the asort() and asorti() functions which can be used to sort the data internally, thereby allowing the code to use a single call to awk and no calls to the sort command. This also preserves the spacing in the original data.

This variation uses the 'square of the distance' idea suggested by zhihao_li in their answer.

gawk -v ref=48.0 '
function comp_idx(i1, v1, i2, v2) {
    if (i1+0 < i2+0) return -1; else if (i1+0 > i2+0) return +1; else return 0;
}
    { data[($3-ref)^2] = $0 }
END { 
      n = asorti(data, results, "comp_idx")
      for (i = 1; i <= n; i++) print data[results[i]]
    }' "$@"

The +0 operations in the comp_idx function are necessary to force awk to treat the index values as numbers rather than strings. Without those, the sort order was based on the lexicographical (not numeric) order of the squared distances. If a single line is important, you could write that all on one line, but you'd need a sprinkling of semicolons added too. I don't recommend it.

You could revise the code into a more comprehensive shell script that takes the age as an argument that's passed to Awk (the -v ref=30.0 mechanism). That's more fiddly than difficult. As it stands, it just processes the files it is given — or standard input if no files are given.

With the sample data, the output for the reference age of 48.0 is:

name4    F    55.0
name2    M    31.5
name1    M    73.2
name3    F    20.3

Change the reference age from 48.0 to 30.0 as in the question and the result is:

name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

Upvotes: 1

zhihao_li
zhihao_li

Reputation: 183

The discussion above about adding another column was helpful. I came up with this solution with ${ag} providing the given age. The square operation is simpler than checking on the absolution.

awk -v a=${ag} '{print $1,$2,$3,($3-a)^2}' | sort -n -k 4

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84551

In a similar manner, if there is a need to preserve the line format on the original, instead of printing the first three field, you can use a variable and truncate after the third field of the results from sort, e.g.

awk 'function abs(v) { return v < 0 ? -v : v }
    { print $0"\t"abs($NF-30) }' file | 
sort -k4n |
awk '{ out=$0; print substr(out, 0, match (out,$3)+length($3)) }'

Example Use/Output

With your example file in the file named file, you would get:

$ awk 'function abs(v) { return v < 0 ? -v : v }
>     { print $0"\t"abs($NF-30) }' file |
> sort -k4n |
> awk '{ out=$0; print substr(out, 0, match (out,$3)+length($3)) }'
name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

(note: you can just select-copy the original awk expression and then in an xterm with file in the current working directory, middle-mouse-paste to test)

Upvotes: 2

Related Questions