Reputation: 1
How would you perform a Unix sort only on an internal column?
The following statement seems reasonable, but it unexpectedly forgets about the first randomization step: it produces the same output when repeated.
$ sort --random-sort test.txt | sort --key=2,2
1 a 2
2 a 1
1 b 2
2 b 1
btw, my interest is eventually to create stratified random samples (which first requires randomization and grouping).
Upvotes: 0
Views: 187
Reputation: 113934
If you want some randomness to remain, you need to add the --stable
option to the second sort:
$ sort --random-sort test.txt | sort --key=2,2 --stable
2 a 1
1 a 2
1 b 2
2 b 1
$ sort --random-sort test.txt | sort --key=2,2 --stable
1 a 2
2 a 1
1 b 2
2 b 1
This is documented by gnu.org:
A pair of lines is compared as follows: sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. If no key fields are specified, sort uses a default key of the entire line. Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified. The --stable (-s) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. The --unique (-u) option also disables the last-resort comparison.
In other words, in your case, if two lines compare the same under key=2,2, sort
will, by default, ignore your key selection and compare the entire line. By specifying --stable
, the default behavior is suppressed and the original order is preserved for those lines.
Upvotes: 1