user751651
user751651

Reputation:

Find the 10,000 largest out of 1,000,000 total values

I have a file that has 1,000,000 float values in it. I need to find the 10,000 largest values.

I was thinking of:

  1. Reading the file
  2. Converting the strings to floats
  3. Placing the floats into a max-heap (a heap where the largest value is the root)
  4. After all values are in the heap, removing the root 10,000 times and adding those values to a list/arraylist.

I know I will have

  1. 1,000,000 inserts into the heap
  2. 10,000 removals from the heap
  3. 10,000 inserts into the return list

Would this be a good solution? This is for a homework assignment.

Upvotes: 0

Views: 321

Answers (4)

Pops
Pops

Reputation: 30828

Sorting is expensive, and your input set is not small. Fortunately, you don't care about order. All you need is to know that you have the top X numbers. So, don't sort.

How would you do this problem if, instead of looking for the top 10,000 out of 1,000,000, you were looking for the top 1 (i.e. the single largest value) out of 100? You'd only need to keep track of the largest value you'd seen so far, and compare it to the next number and the next one until you found a larger one or you ran out of input. Could you expand that idea back out to the input size you're looking at? What would be the big-O (hint: you'd only be looking at each input number one time)?

Final note since you said this was homework: if you've just been learning about heaps in class, and you think your teacher/professor is looking for a heap solution, then yes, your idea is good.

Upvotes: 0

interjay
interjay

Reputation: 110118

Your solution is mostly good. It's basically a heapsort that stops after getting K elements, which improves the running time from O(NlogN) (for a full sort) to O(N + KlogN). Here N = 1000000 and K = 10000.

However, you should not do N inserts to the heap initially, as this would take O(NlogN) - instead, use a heapify operation which turns an array to a heap in linear time.

If the K numbers don't need to be sorted, you can find the Kth largest number in linear time using a selection algorithm, and then output all numbers larger than it. This gives an O(n) solution.

Upvotes: 7

leobelones
leobelones

Reputation: 598

How about using mergesort(log n operations in worst case scenario) to sort the 1,000,000 integers into an array then get the last 10000 directly?

Upvotes: 0

usumoio
usumoio

Reputation: 3568

Could you merge sort the values in the array after you have read them all in? This is a fast way to sort the values. Then you could request your_array[10000] and you would know that it is the 10000th largest. Merge sort sounds like what you want. Also if you really need speed, you could look into format your values for radix sort, that would take a bit of formatting but it sounds like that would be the absolute fastest way to solve this problem.

Upvotes: -1

Related Questions