Abhishek
Abhishek

Reputation: 516

String sorting using Merge Sort

What will be the worst complexity for sorting n strings having n characters each? Will it be just n times its avg. case O(n log n) or something else...?

Upvotes: 6

Views: 2766

Answers (3)

Ali Ferhat
Ali Ferhat

Reputation: 2579

Sorting n items with MergeSort requires O(N LogN) comparisons. If the time to compare two items is O(1) then the total running time will be O(N logN). However, comparing two strings of length N requires O(N) time, so a naive implementation might stuck with O(N*N logN) time.

This seems wasteful because we are not taking advantage of the fact that there are only N strings around to make comparisons. We might somehow preprocess the strings so that comparisons take less time on average.

Here is an idea. Create a Trie structure and put N strings there. The trie will have O(N*N) nodes and require O(N*N) time to build. Traverse the tree and put an integer "ranking" to each node at the tree; If R(N1)<R(N2) then the string associated with Node1 comes before the string associated with Node2 in a dictionary.

Now proceed with Mergesort, do the comparisons in O(1) time by looking up the Trie. The total running time will be O(N*N + N*logN) = O(N*N)

Edit: My answer is very similar to that of @amit. However I proceed with mergesort where he proceeds with radixsort after the trie building step.

Upvotes: 0

amit
amit

Reputation: 178511

As @orangeoctopus, using standard ranking algorithm on a collection of n strings of size n will result in O(n^2 * logn) computation.

However - note that you can do it in O(n^2), with variations on radix sort.

The simplest way to do it [in my opinion] - is

  1. build a trie, and populate it with all your strings. Entering each string is O(n) and you do it n times - total of O(n^2)
  2. do a DFS on the trie, each time you encounter the mark for end for string - add it to the sorted collection. The order of the strings added this way is lexicographically, so your list will be sorted lexicographically when you are done.

It is easy to see you cannot do it any better then O(n^2), since only reading the data is O(n^2), thus this solution is optimal in terms of big O notation of time complexity.

Upvotes: 3

Donald Miner
Donald Miner

Reputation: 39923

When you are talking about O notation with two things with different lengths, typically you want to use different variables, like M and N.

So, if your merge sort is O(N log N), where N is the number of strings... and comparing two strings is O(M) where M scales with the length of the string, then you'll be left with:

O(N log N) * O(M)

or

O(M N log N)

where M is the string length and N is the number of strings. You want to use different labels because they don't mean the same thing.

In the strange case where the average string length scales with the number of strings, like if you had a matrix stored in strings or something like that, you could argue that M = N, and then you'd have O(N^2 log N)

Upvotes: 8

Related Questions