Reputation: 77
I have a report which shows 2-4 million records. I get the records from oracle to java and push it to an excel report. All this is already done!
Now, I also need to add a new tab with top 10 and last 10 records. What would be the best way to do it?
Should i use PriorityQueue implementation in java or use a binary tree to keep a track of top 10 and last 10. I don't need to store the billion records in the data structure. I just need to save 10 at a time. ex:
PriorityQueue<DataObject> queueTop10 = new PriorityQueue<DataObject>(10, topComparator);
PriorityQueue<DataObject> queueLast10 = new PriorityQueue<DataObject>(10, leastComparator);
while (data is coming from database)
{
// push to excel stuff here
queueTop10 .add(dataObject); OR binarytreeTop.insert(dataObject)
queueLast10.add(dataObject); OR binarytreeLeast.insert(dataObject)
}
Please let me know if i can use some other data structure as well.
Thanks
Upvotes: 1
Views: 1953
Reputation: 47300
4 billion records in an excel spreadsheet ? nah, you don't https://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx
You should do this on the database, and not rely on the java implementation. For this many records it is bound to be less efficient than an optimized db query.
Upvotes: 0
Reputation: 726987
PriorityQueue<T>
will not work with your code as-is, because 10 in the constructor is the initial capacity; your queue will grow to 1B items as you go.
However, TreeSet<T>
will work, with a small modification. You need to add code that removes the eleventh item every time the queue grows past ten:
TreeSet<DataObject> top10 = new TreeSet<DataObject>(topComparator);
TreeSet<DataObject> bottom10 = new TreeSet<DataObject>(leastComparator);
while (data is coming from database) {
top10.add(dataObject);
if (top10.size() == 11) {
top10.pollLast();
}
bottom10.add(dataObject);
if (bottom10.size() == 11) {
bottom10.pollLast();
}
}
Upvotes: 0
Reputation: 269857
Top hit algorithms use a min-heap (PriorityQueue
in Java), but there should be some size checking in your algorithm. Suppose each item has a score, and you want to collect the 10 items with the highest score. PriorityQueue
efficiently exposes the item with the lowest score:
PriorityQueue<DataObject> top = new PriorityQueue(10, comparator);
for (DataObject item : items) {
if (top.size() < 10) top.add(item);
else if(comparator.compare(top.peek(), item) < 0) {
top.remove();
top.add(item);
}
}
Upvotes: 2
Reputation: 133
You can use a priority queue since it acts like a heap in Java. See How does Java's PriorityQueue differ from a min-heap? If no difference, then why was it named PriorityQueue and not Heap?
Upvotes: 0