Reputation: 516
What will be the worst complexity for sorting n
strings having n
characters each? Will it be just n
times its avg. case O(n log n)
or something else...?
Upvotes: 6
Views: 2766
Reputation: 2579
Sorting n items with MergeSort requires O(N LogN)
comparisons. If the time to compare two items is O(1)
then the total running time will be O(N logN)
. However, comparing two strings of length N requires O(N)
time, so a naive implementation might stuck with O(N*N logN)
time.
This seems wasteful because we are not taking advantage of the fact that there are only N
strings around to make comparisons. We might somehow preprocess the strings so that comparisons take less time on average.
Here is an idea. Create a Trie structure and put N strings there. The trie will have O(N*N)
nodes and require O(N*N)
time to build. Traverse the tree and put an integer "ranking" to each node at the tree; If R(N1)<R(N2) then the string associated with Node1 comes before the string associated with Node2 in a dictionary.
Now proceed with Mergesort, do the comparisons in O(1)
time by looking up the Trie. The total running time will be O(N*N + N*logN)
= O(N*N)
Edit: My answer is very similar to that of @amit. However I proceed with mergesort where he proceeds with radixsort after the trie building step.
Upvotes: 0
Reputation: 178511
As @orangeoctopus, using standard ranking algorithm on a collection of n
strings of size n
will result in O(n^2 * logn)
computation.
However - note that you can do it in O(n^2)
, with variations on radix sort.
The simplest way to do it [in my opinion] - is
O(n)
and you do it n
times - total of O(n^2)
It is easy to see you cannot do it any better then O(n^2)
, since only reading the data is O(n^2)
, thus this solution is optimal in terms of big O notation of time complexity.
Upvotes: 3
Reputation: 39923
When you are talking about O
notation with two things with different lengths, typically you want to use different variables, like M
and N
.
So, if your merge sort is O(N log N)
, where N
is the number of strings... and comparing two strings is O(M)
where M
scales with the length of the string, then you'll be left with:
O(N log N) * O(M)
or
O(M N log N)
where M
is the string length and N
is the number of strings. You want to use different labels because they don't mean the same thing.
In the strange case where the average string length scales with the number of strings, like if you had a matrix stored in strings or something like that, you could argue that M = N
, and then you'd have O(N^2 log N)
Upvotes: 8