Reputation: 265281
Is there a difference in the order of uniq
and sort
when calling them in a shell script? I’m talking here about time- and space-wise.
grep 'somePattern' | uniq | sort
vs.
grep 'somePattern' | sort | uniq
a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)
I don’t know how to measure memory usage though …
The question now is: does the order make a difference? Or is it dependent on the returned lines from grep (many/few duplicates)
Upvotes: 3
Views: 3124
Reputation: 68278
The only correct order is to call uniq
after sort
, since the man page for uniq
says:
Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).
Therefore it should be
grep 'somePattern' | sort | uniq
Upvotes: 9
Reputation: 83
uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.
Upvotes: 3
Reputation: 71945
I believe that sort -u
is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort
and uniq
individually in either order.
Upvotes: 10