Reputation: 4114
I like to use the -u
option of the UNIX sort
utility to get unique lines based on a particular subset of columns, e.g. sort -u -k1,1 -k4,4
I have looked extensively in UNIX sort
and GNU sort
documentation, and I cannot find any guarantee that the -u
option will return the first instance (like the uniq
utility) after sorting by the specified keys.
It seems to work as desired in practice (sort by keys, then give first instance of each unique key combination), but I was hoping for some kind of guarantee in the documentation to put my paranoia at ease.
Does anyone know of such a guarantee?
Upvotes: 0
Views: 64
Reputation: 3833
I think the code for such a small utility is likely the only place you'll find such a guarantee. You can enable more debugging output as well if you'd like to see how it is working.
If you look through the code for GNU sort
, it appears that the uniqueness testing happens after all sorting is completed, when it is iterating through the sorted contents of the temporary files created by the sorting process.
This happens in a while loop that compares the previous line savedline
with smallest
, which is the next smallest input line which would be output.
Thus, my opinion would be that it will process your sorting criteria first, then unique the output at the last step.
Upvotes: 1