Reputation: 298
A researcher has a database of 100 million records of people. The researcher wants to study the distribution of given names according to other criteria such as zodiac sign, birth year, etc, so wants to sort by name with the option of further sorting later.
Which sort should I use?
A. selection
B. quick
C. heap
D. insertion
E. merge
Thanks!
Upvotes: 4
Views: 1546
Reputation: 13188
Someone posted a duplicate, and this was going to be my answer. Since I went through the effort to type all this, I may as well share it for future readers.
Each sorting algorithm has its best and worse use cases. This is how I try to think about it:
Obviously that is a very breif overview. You can find a lot more info on Wikipedia and through a Google search like: "When to use [Insert Algorithm Here]"
Hope that helps!
Upvotes: 1
Reputation: 712
The most efficient sorting algorithm, wouldn't be a traditional one.
Since you are sorting based on criteria such as birth year and zodiac sign, I would do a "stack sort" (I just made that up).
It would work this way.
Create a data structure for each possible sorted value. Let's use birth year for example. In birth year, there are only going to be ~100 different values that it could be.
When you're done looping through each record, you've now got 100 arrays, each filled with records that have that particular birth year. The great part about this is that you've done it in O(n) time, so it's much faster than any other sorting algorithm. This also works for zodiac signs etc...
Think outside of the box. This approach is very useful when sorting a large dataset (n) with possible values (m), where m << n.
Upvotes: 0
Reputation: 533472
If you want to get a histogram, I wouldn't sort the data. I would just go through all the data counting all the combinations of interest. This is an O(N) operation.
Sorting the data first is unlikely to improve speed. This is an O(N*log(N)) operation.
If wanted to sort all the record, I would use a Collection.sort() with a custom comparator which has all the fields you need to compare. You would have to load all the records into memory which will take a few GB, but once you have done this it should be pretty fast.
The only way to make this faster would be to filter down the criteria. If you do this, I would create a Collection which has a copy of the records of interest and sort that.
Upvotes: 0
Reputation: 437336
It's not really my answer since you reached it yourself, but here it is for better visibility:
O(n^2)
average running time, which isn't going to cut it for 100M items.Update: Exam-related advice
I have to admit that point 2 above (preserve the sort by name) is not totally clear from the problem description. However, this is an exam question and there has to be some way of trimming the options down to one. This is only made possible by demanding a stable sort, so the requirement is there even if the wording is not ironclad.
This way of practical thinking makes it IMHO much easier to reach definitive answers for some types of exam questions.
Upvotes: 6
Reputation: 272467
Try mapping your requirements to the comparison table at http://en.wikipedia.org/wiki/Sort_algorithms#Comparison_of_algorithms.
Upvotes: 4