knittl
knittl

Reputation: 265281

calling uniq and sort in different orders in shell

Is there a difference in the order of uniq and sort when calling them in a shell script? I’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

I don’t know how to measure memory usage though …

The question now is: does the order make a difference? Or is it dependent on the returned lines from grep (many/few duplicates)

Upvotes: 3

Views: 3124

Answers (3)

Robert Munteanu
Robert Munteanu

Reputation: 68278

The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq

Upvotes: 9

Sven Schott
Sven Schott

Reputation: 83

uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.

Upvotes: 3

mqp
mqp

Reputation: 71945

I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.

Upvotes: 10

Related Questions