Reputation: 55
Given an input list like the following:
405:alice@level1
405:bob@level2
405:chuck@level1
405:don@level3
405:eric@level1
405:francis@level1
004:ac@jjj
004:la@jjj
004:za@zzz
101:amy@floor1
101:brian@floor3
101:christian@floor1
101:devon@floor1
101:eunuch@floor2
101:frank@floor3
005:artie@le2
005:bono@nuk1
005:bozo@nor2
(As you can see, the first field was randomly sorted (the original input had all of the first field in numerical order, with 004 coming first, then 005, 101, 405, et al) but the second field is in alphabetical order on the first character.)
What is desired is a randomized sort where the first field - as separated by a colon ':', is randomly sorted so that all of the entries of the second field do not matter during the random sort, so long as all lines where the first field are the same are grouped together but randomly distributed throughout the file - is to have the second field randomly sorted as well. That is, in the final output, lines with the same value in the first field are grouped together (but randomly distributed throughout the file) but also to have the second field randomly sorted. I am unable to get this desired result as I am not too familiar with sort keys and whatnot.
The desired output would look similar to this:
405:francis@level1
405:don@level3
405:eric@level1
405:bob@level2
405:alice@level1
405:chuck@level1
004:za@zzz
004:ac@jjj
004:la@jjj
101:christian@floor1
101:amy@floor1
101:frank@floor3
101:eunuch@floor2
101:brian@floor3
101:devon@floor1
005:bono@nuk1
005:artie@le2
005:bozo@nor2
Does anyone know how to achieve this type of sort?
Thank you!
Upvotes: 0
Views: 52
Reputation: 67537
not as elegant but a different method
$ awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' file |
sort -R -k2 |
sort -nk1,1 -s |
cut -f2-
or, this alternative which doesn't assume initial grouping
$ sort -R file |
awk -F: '!($1 in a){a[$1]=c++} {print a[$1] "\t" $0}' |
sort -nk1,1 -s |
cut -f2-
Upvotes: 1
Reputation: 46876
You can do this with awk
pretty easily.
As a one-liner:
awk -F: 'BEGIN{cmd="sort -R"} $1 != key {close(cmd)} {key=$1; print | cmd}' input.txt
Or, broken apart for easier explanation:
-F:
- Set awk's field separator to colon.BEGIN{cmd="sort -R"}
- before we start, set a variable that is a command to do the "randomized sort". This one works for me on FreeBSD. Should work with GNU sort as well.$1 != key {close(cmd)}
- If the current line has a different first field than the last one processed, close the output pipe...{key=$1; print | cmd}
- And finally, set the "key" var, and print the current line, piping output through the command stored in the cmd
variable.This usage takes advantage of a bit of awk awesomeness. When you pipe through a string (be it stored in a variable or not), that pipe is automatically created upon use. You can close it any time, and a subsequent use will reopen a new command.
The impact of this is that each time you close(cmd)
, you print the current set of randomly sorted lines. And awk closes cmd
automatically once you come to the end of the file.
Of course, for this solution to work, it's vital that all lines with a shared first field are grouped together.
Upvotes: 2