Reputation: 1751
I have to find the 90-100th largest number from an array when an array is expanding and the data are being populated or added every seconds or less than a seconds. May be considered as a stream of running data. I thought of using the data structure Binary tree but every time constructing the tree and placing the new data at the appropriate place will be quite tedious since data's are coming very fast.
I want to know which data structure will be most suitable to find the 90-100 th largest data.
Upvotes: 1
Views: 169
Reputation: 134005
One solution is to use a Min heap with size 100. As each item comes in, you compare it to the smallest item in the heap. If it's larger than the smallest item in the heap, you remove the smallest item and replace it with the new item. Whenever you want the 100th largest item, you just get the minimum item from the heap.
In pseudocode, it looks something like this:
heap = new MinMaxHeap()
while (item available)
{
if (heap.count < 100)
{
add item to heap
}
else if (item > heap.PeekMin())
{
remove minimum item from heap
add item to heap
}
}
Any time you want the 100th largest item, just do a PeekMin
on the heap.
Examining the smallest item (i.e. PeekMin
) is an O(1) operation. Removal and insertion are each O(log k), where k
is the number of items on the heap. In practice, assuming items are being presented in random order, you'll only have to do remove and replace on about 10% of the items. Worst case, when items are presented in ascending order, you'll have to do remove/replace on every item.
Now, if you want the 90th through 100th items, you'll need to sort the heap and take the first 11 items. Fortunately, a sorted array is a valid binary heap, so sorting won't destroy the heap property. And you're only sorting 100 items, so it's going to be very fast.
Implementing a simple binary heap as an array is very easy. I described it at length in a series of blog posts, starting with Priority Queues. The code examples are in C#, but you could easily convert them to php.
Another way to do this is with two heaps. The first heap is as described above: the top 100 items. The second heap would be a Max heap in which you store the smallest 11 items from the top 100. Your replacement policy on the second heap is the opposite. That is, you compare the new item to the item at the root of the heap and replace if the new item is smaller. With this structure, you'll still have to sort that second heap if you want the items in order, but sorting 11 items will be really fast.
Upvotes: 3