Reputation: 3152

What would be the fastest way to sort an array of words containing a-z and spaces only?

I'd like to know if there's some faster way to sort such array than quicksort/mergesort.

Maximum array's length is 10^6. Word's length is >=10 and <= 100 and the word can contain a-z and spaces (27 different characters in total). Characters are not unique in the words (they can repeat). All the words in an array are equally long.

Upvotes: 5

Answers (5)

Jan Boonen

Reputation: 93

Why not a distribution sort per three characters: that would need a count storage of 19683 (27*27*27) elements, which should be feasible, and then at most 34 passes are needed.

But very soon, the sublists per key (multiple of three characters) will be short enough to use an insertion sort or alike on the remaining of the string. 1.000.000/(27^3) is about 50

The same mechanism can be used with longer keys if they have long prefixes in common i.e. and the first 30 characters would divide the list in only 20 or 30 sublists. Then you don't represent the keys as numbers, but as strings, and store them in a dictionairy, which is slower, but less passes are needed then, and maybe less memory as well. Also it would need around N*log(M) lookups with M the number of different keys in a binairy tree, but hashing is also a possibility.

Upvotes: 0

roman

Reputation: 117345

Well I've read (and upvoted) an answers about radix sort and radix trie, very informative.
But.
In case of radix sort - you need to make 91 passes of N elements, so it will be 91 * N. I'm not talking about additional space.
In case of mergesort you have N * log N compares, and since log N = log 1000000 ~ 20 you got 20 * N compares.

So which one is faster? :) Or may be I've mistaken somewhere?

Upvotes: 1

Moustafa Alzantot

Reputation: 371

The lower bound for any comparison based sort is O(nlog(n)). You can't have any sorting algorithm based on comparing elements to each other that runs on a worst case lower than this limit.

both merge sort and heap sort have a worst case running time of O(nlog(n))... And quick sort have a worst case running time of O(n^2), but the average running time is O(n^log(n)).

It is worth to mention that although quick sort have a worst time running time of O(N^2), it sometimes beats others algorithms with O(nlog(n)) running time (like heapsort) due to having a small constant factor and suitability for efficient execution on current machine architectures.

linear sorting algorithms that allows for sorting integers (but not only limited to them) in linear time O(n) on a non comparative basis (Examples: counting sort, bucket sort, and radix sort)

MSD radix sort can sort strings using lexicographic ally order of digits (in this case characters) and from the left to the right.

It sorts all strings first using the leftmost character using another linear sorting algorithm (say bucket sort), then sort them again using the second from the left character and so on until they are sorted by the rightmost character. At the end the array will be completely sorted.

This algorithm will have a running time of O(k*N) where N is the number of elements, and k is the average key length (word length in this case it will be >=10 && <=100).

Upvotes: 1

The Internet

Reputation: 8103

The ascii values can be computed so essentially this is an integer sort. Comparison based sorting routines will at best get you O(n lg n) - Merge Sort (with additional space needed for creating two more arrays of size n/2) or O(n^2) at worst (insertion sort, quicksort, but they have no additional space complexity). These are asymptotically slower than a linear sorting algorithm. I recommend looking at CLRS (http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844). The chapter on sorting in linear time. O(n) is probably the best you can do in this scenario. Also, this post might help. Sorting in linear time?

I'd check out radix sort. http://en.wikipedia.org/wiki/Radix_sort

Upvotes: 0

amit

Reputation: 178411

You can put all the words in a trie (or a radix tree), and then print it in a DFS order, starting from the "smaller" lexicographical letter at each level in the DFS.

This solution will be O(n* |S|) where |S| is the average string length.

Simple example:

Let the set of strings be [ac,ab,aca]:

The resulting trie will be:

And a DFS (which prefers lexicographically smaller characters): The DFS will start from a, go to b, and then to the end sign ($) and will first print ab, then go back to a, and right to c, and to the next $ sign, and will print ac, and next to a and its $ and will print aca, resulting in printing:

ab
ac
aca

As expexted.

Upvotes: 8

What would be the fastest way to sort an array of words containing a-z and spaces only?

Answers (5)

Related Questions