Otaku Kyon
Otaku Kyon

Reputation: 445

Sorting lines numerically by field in AWK

I have a text file containing these characters, where my delimiter/separator is ##@##:

Steve##@##Jobs##@##Apple Inc.##@##32421213
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Wozniak##@##Apple Inc.##@##12343
Tim##@##Cook##@##Apple Inc.##@##323345223

Now I want them to be sorted numerically & ascending by the third field. I read that it would be possible to use the bash command sort, which unfortunately only supports a single character as delimiter.

Eventually the sorted file should exactly look like this one:

Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223

Is there a fix for sort or can I do this using awk?

Upvotes: 0

Views: 400

Answers (3)

James K. Lowden
James K. Lowden

Reputation: 7837

Because sort(1) accepts only a single-character delimiter, you want to convert your separator string into something sort recognizes, but is not a value that appears in your data. Your best choice is something that can't appear in the data: a non-printable character. A reasonable candidate is the ASCII field separator, octal 034. Then of course you have to restore your separator after sorting.

If you use bash, you have straightforward access to characters by octal value, else your shell may vary. Then sed makes it a snap:

$ s=$'\034'
$ sed "s/##@##/$s/g" dat | sort -t $s -k4 -n | sed "s/$s/##@##/g"

Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

Solution using perl, no need of other commands

$ cat ip.txt 
Steve##@##Jobs##@##Apple Inc.##@##32421213
Bill##@##Gates##@##Microsoft Corp.##@##234213
abc##@##xyz##@##123 Corp.##@##234213
Steve##@##Wozniak##@##Apple Inc.##@##12343
Tim##@##Cook##@##Apple Inc.##@##323345223

$ perl -ne '($k)=/(\d+)$/; $h{$k} .= $_; END{foreach (sort {$a <=> $b} keys %h){print $h{$_}}}' ip.txt 
Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
abc##@##xyz##@##123 Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223
  • The number at end of line is used as a key
  • Input line is appended to hash variable based on key, that way multiple lines with same key is also handled
  • After all lines are processed, the keys are sorted numerically and corresponding values are printed out

Upvotes: 0

redneb
redneb

Reputation: 23850

Here's a (hackish) idea. Use awk to add the numeric field to the beginning of each line, so that we can sort it with sort and then use sed to get rid of the stuff that we added in the first step. Something like that:

awk -vFS='##@##' '{print $4 "|" $0}' input | sort -n | sed -e 's/^[^|]*|//'

Upvotes: 2

Related Questions