Reputation: 445
I have a text file containing these characters, where my delimiter/separator is ##@##
:
Steve##@##Jobs##@##Apple Inc.##@##32421213
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Wozniak##@##Apple Inc.##@##12343
Tim##@##Cook##@##Apple Inc.##@##323345223
Now I want them to be sorted numerically & ascending by the third field. I read that it would be possible to use the bash command sort
, which unfortunately only supports a single character as delimiter.
Eventually the sorted file should exactly look like this one:
Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223
Is there a fix for sort or can I do this using awk?
Upvotes: 0
Views: 400
Reputation: 7837
Because sort(1) accepts only a single-character delimiter, you want to convert your separator string into something sort recognizes, but is not a value that appears in your data. Your best choice is something that can't appear in the data: a non-printable character. A reasonable candidate is the ASCII field separator, octal 034. Then of course you have to restore your separator after sorting.
If you use bash, you have straightforward access to characters by octal value, else your shell may vary. Then sed makes it a snap:
$ s=$'\034'
$ sed "s/##@##/$s/g" dat | sort -t $s -k4 -n | sed "s/$s/##@##/g"
Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223
Upvotes: 0
Reputation: 23667
Solution using perl
, no need of other commands
$ cat ip.txt
Steve##@##Jobs##@##Apple Inc.##@##32421213
Bill##@##Gates##@##Microsoft Corp.##@##234213
abc##@##xyz##@##123 Corp.##@##234213
Steve##@##Wozniak##@##Apple Inc.##@##12343
Tim##@##Cook##@##Apple Inc.##@##323345223
$ perl -ne '($k)=/(\d+)$/; $h{$k} .= $_; END{foreach (sort {$a <=> $b} keys %h){print $h{$_}}}' ip.txt
Steve##@##Wozniak##@##Apple Inc.##@##12343
Bill##@##Gates##@##Microsoft Corp.##@##234213
abc##@##xyz##@##123 Corp.##@##234213
Steve##@##Jobs##@##Apple Inc.##@##32421213
Tim##@##Cook##@##Apple Inc.##@##323345223
Upvotes: 0
Reputation: 23850
Here's a (hackish) idea. Use awk
to add the numeric field to the beginning of each line, so that we can sort it with sort
and then use sed
to get rid of the stuff that we added in the first step. Something like that:
awk -vFS='##@##' '{print $4 "|" $0}' input | sort -n | sed -e 's/^[^|]*|//'
Upvotes: 2