Reputation:
I have a file that has 1,000,000 float values in it. I need to find the 10,000 largest values.
I was thinking of:
I know I will have
Would this be a good solution? This is for a homework assignment.
Upvotes: 0
Views: 321
Reputation: 30828
Sorting is expensive, and your input set is not small. Fortunately, you don't care about order. All you need is to know that you have the top X numbers. So, don't sort.
How would you do this problem if, instead of looking for the top 10,000 out of 1,000,000, you were looking for the top 1 (i.e. the single largest value) out of 100? You'd only need to keep track of the largest value you'd seen so far, and compare it to the next number and the next one until you found a larger one or you ran out of input. Could you expand that idea back out to the input size you're looking at? What would be the big-O (hint: you'd only be looking at each input number one time)?
Final note since you said this was homework: if you've just been learning about heaps in class, and you think your teacher/professor is looking for a heap solution, then yes, your idea is good.
Upvotes: 0
Reputation: 110118
Your solution is mostly good. It's basically a heapsort that stops after getting K elements, which improves the running time from O(NlogN)
(for a full sort) to O(N + KlogN)
. Here N = 1000000 and K = 10000.
However, you should not do N inserts to the heap initially, as this would take O(NlogN)
- instead, use a heapify operation which turns an array to a heap in linear time.
If the K numbers don't need to be sorted, you can find the Kth largest number in linear time using a selection algorithm, and then output all numbers larger than it. This gives an O(n)
solution.
Upvotes: 7
Reputation: 598
How about using mergesort(log n operations in worst case scenario) to sort the 1,000,000 integers into an array then get the last 10000 directly?
Upvotes: 0
Reputation: 3568
Could you merge sort the values in the array after you have read them all in? This is a fast way to sort the values. Then you could request your_array[10000] and you would know that it is the 10000th largest. Merge sort sounds like what you want. Also if you really need speed, you could look into format your values for radix sort, that would take a bit of formatting but it sounds like that would be the absolute fastest way to solve this problem.
Upvotes: -1