user1797559
user1797559

Reputation: 77

Get top 10 and last 10 from a million records

I have a report which shows 2-4 million records. I get the records from oracle to java and push it to an excel report. All this is already done!

Now, I also need to add a new tab with top 10 and last 10 records. What would be the best way to do it?

Should i use PriorityQueue implementation in java or use a binary tree to keep a track of top 10 and last 10. I don't need to store the billion records in the data structure. I just need to save 10 at a time. ex:

PriorityQueue<DataObject> queueTop10 = new PriorityQueue<DataObject>(10, topComparator);
PriorityQueue<DataObject> queueLast10 = new PriorityQueue<DataObject>(10, leastComparator);
    while (data is coming from database)
    {
    // push to excel stuff here
    queueTop10 .add(dataObject);   OR binarytreeTop.insert(dataObject)
    queueLast10.add(dataObject);   OR binarytreeLeast.insert(dataObject)
    }

Please let me know if i can use some other data structure as well.

Thanks

Upvotes: 1

Views: 1953

Answers (4)

NimChimpsky
NimChimpsky

Reputation: 47300

4 billion records in an excel spreadsheet ? nah, you don't https://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx

You should do this on the database, and not rely on the java implementation. For this many records it is bound to be less efficient than an optimized db query.

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726987

PriorityQueue<T> will not work with your code as-is, because 10 in the constructor is the initial capacity; your queue will grow to 1B items as you go.

However, TreeSet<T> will work, with a small modification. You need to add code that removes the eleventh item every time the queue grows past ten:

TreeSet<DataObject> top10 = new TreeSet<DataObject>(topComparator);
TreeSet<DataObject> bottom10 = new TreeSet<DataObject>(leastComparator);
while (data is coming from database) {
    top10.add(dataObject);
    if (top10.size() == 11) {
        top10.pollLast();
    }
    bottom10.add(dataObject);
    if (bottom10.size() == 11) {
        bottom10.pollLast();
    }
}

Upvotes: 0

erickson
erickson

Reputation: 269857

Top hit algorithms use a min-heap (PriorityQueue in Java), but there should be some size checking in your algorithm. Suppose each item has a score, and you want to collect the 10 items with the highest score. PriorityQueue efficiently exposes the item with the lowest score:

PriorityQueue<DataObject> top = new PriorityQueue(10, comparator);
for (DataObject item : items) {
  if (top.size() < 10) top.add(item);
  else if(comparator.compare(top.peek(), item) < 0) {
    top.remove();
    top.add(item);
  }
}

Upvotes: 2

nullPointer
nullPointer

Reputation: 133

You can use a priority queue since it acts like a heap in Java. See How does Java's PriorityQueue differ from a min-heap? If no difference, then why was it named PriorityQueue and not Heap?

Upvotes: 0

Related Questions