Reputation: 54697
I have a situation whereby I'm populating an ArrayList
with "TransactionEvent
"s. TransactionEvent
has a property "transaction ID". In the large majority of cases each new event has a transaction ID greater the previous event's ID - However, this is not guaranteed; i.e. the data is almost sorted.
My question is this: How can I perform fast look-ups based on transaction ID? My current idea is to call Collections.binarySearch(...)
and if this fails then perform a linear search. However, I notice the Javadoc states that the result of binarySearch is undefined is the data is unordered so I may have to roll my own implementation.
Additional:
List
is the basis for a TableModel
currently performing very slowly when containing a large number of rows (100,000).Any help appreciated.
Upvotes: 5
Views: 3894
Reputation: 4477
I've cleaned things up a bit from my previous post. @Lizzard, your solution is best given the property that new entries are usually at the end. The solution below should perform better if you have random arrivals at the cost of more memory for the maps. It also lets you defer your array insertion (potentially O(n) worst case) until you actually need to draw the cell for a row below the earliest insertion point.
// sorted events (using natural ordering on eventID)
SortedSet<Event> model = new TreeSet<Event>();
ArrayList<Event> sortedList = new ArrayList<Event>();
Event lowestAddition, additionPrevEntry; // low water mark for insertions
public void insert(Event x) {
if (x < lowestAddition) {
Set<Event> headSet = model.headSet(x); // find the insertion point
additionPrevEntry = headSet.isEmpty()?model.last():headSet.first();
lowestAddition = x;
}
model.add(x); // add
}
public void materialize() {
SortedSet<Event> tailSet = model.tailSet(additionPrevEntry);
Event firstValue = tailSet.first(); // this element does not change its order
Integer order = firstValue.getOrder(); // keep order on Event
for (Event x : tailSet) {
x.setOrder(order);
sortedList.set(order, x);
order++;
}
lowestAddition = null; additionPrevEntry = null;
}
Here is what your swing code looks like, I assume you are using Swing since you want a table model:
// now your model code uses the array
public Object getValueAt(int row, int col) {
return getColumn(sortedList.elementAt(row), col);
}
// you can gain significant performance by deferring
// materialization until you acutally need it
public class DeferredJTable extends JTable {
public void paintComponent(Graphics G, ...) {
// if you knew what rows in the table were being drawn
// ahead of time, you could further defer
materialize();
super.paintComponent();
}
}
Upvotes: 0
Reputation: 18514
My first answer wasn't really what you were looking for. Now that I understand the problem better, give this a try. I only implemented the key parts. This will be a little bit more memory intensive, but since I'm pretty sure the ArrayList stores the references, not the objects themselves, the memory difference shouldn't be too huge in comparison with the actual object storage.
class TransactionEventStore
{
private ArrayList<TransactionEvent> byOrder, byId;
private void insertByOrder(TransactionEvent e) { this.byOrder.add(e); }
private void insertById(TransactionEvent e)
{
for(int i = this.byId.length() - 1; i > 0; i--)
if(e.getId() > this.byId.get(i).getId())
{
this.byId.add(i,e);
break;
}
}
public void insert(TransactionEvent e)
{
this.insertByOrder(e);
this.insertById(e);
}
}
Now when you need to lookup by insertion order, look at this.byOrder
and when you need to lookup by id, look at this.byId
.
Upvotes: 0
Reputation: 15907
ArrayList is for toy-sized problems. 100.000 rows is getting a little out of toy space. That means you have to be more precise about the access patterns you need to support. A sorted ArrayList might be enough, and if processing speed is growing faster than your problem size, you might not want to bother, but a BTree will be faster at 100K elements.
ArrayList has the following problems with larger problem sizes:
A two-level collection with fixed page size (e.g. BTree) can help because a grow will mean adding a (ideally) about sqrt(size) page and a random insert will max split one page in two.
With two needed sort orders, you can simply use two (sorted) BTrees
[edit] The answer to the earlier question is the key to the problem. For a 1000 element ArrayList, the insert costs 7 microseconds, for 1000000 elements 7 milliseconds. The BTree stays in the microseconds range (but could be twice as slow for 1000 element page size).
Indexed acces you can create by keeping an index of the number of elements in each page. If you set a dirty flag on each page you can use a background thread to update the start index of each page, or you can add bulk operations with delayed index building.
The index can be invalid, but it is just sqrt(size) large. For 100K elements, it is just incrementing 150 indexes on average. That takes microseconds, not milliseconds
Upvotes: 1
Reputation: 4477
Why not just use a sorted collection as your table model instead of a list. TreeMap seems logical since your entries are all ordered. If you also need fast access by row or any other column, you can simply add a secondary map. Basically you are doing what database indexes do.
I thought for some reason that you could use the map.headSet(key) and find the kth entry -- this won't work. You need to be able to get from table row -> EventID (or close to it).
if you use a model like this
Map<EventID, Event> model = new TreeSet<EventID, Event>();
Conceptually your getValueAt() looks like this:
getValueAt(int row, column) {
eventID = getSortPosition(row);
Event e = model.headSet(eventID).next();
return getColumn(e, column);
}
The key is being able to efficiently maintain a map from sort index -> key (an inverse map). This is non-trival since inserting a new event at the very top affects the absolute order of all those below it. It seems like there should be a CS answer here, but it escapes me.
Here is the most basic implementation: - on every insert, you update your map, then materialize your sorted map.
ArrayList<Event> orderedEvents = new ArrayList<Event>();
public void insert(Event event) {
model.put(event.getID(), event);
// update the
model.headSet().addAll(orderedEvents);
}
Your getValueAt() would be pretty simple.
getValueAt(int row, column) {w);
Event e = orderedEvents.get(row);
return getColumn(e, column);
}
I think you should reconsider your UI design If you are having users browse a 100K row table, adding a search filter will solve your performance problem:
Upvotes: 0
Reputation: 18514
My vote is that you insert into the list in order. Then you can do a binary search. A few notes:
Upvotes: 0
Reputation: 17359
I had the same problem. The solution I came up with is custom collection based the ArrayList which inlcudes Map of the all the elements too. This is not hard to do. If you'd like me to post the source code - let me know
Upvotes: 0
Reputation: 1041
I would use a binary search to get an approximate location of the id and then search outwards linearly. The down side to this is that if the id you're searching for isn't in the list then it will take O(n + log n).
Binary searches are very easy to implement and I recommend reading the wikipedia article.
Upvotes: 0
Reputation: 21884
From what you have said, it looks like fast look ups are the most important thing here.
So perhaps you should be using a HashMap instead of an ArrayList. In the HashMap, store your TransactionEvents using the TransactionID as the Key. Lookups in a HashMap are O(1).
Note that adding to the HashMap can become quite slow if you exceed its initial capacity - since it has to do a re-hash. If you can, try to initialise it with a best guess (err on the high side) as to the number if items it will hold.
With 100k rows, you might have to increase your java heap size to prevent OutOfMemoryErrors.
java -Xms<initial heap size> -Xmx<maximum heap size>
Defaults are:
java -Xms32m -Xmx128m
EDIT:
If ordering really is important you could use a SortedMap.
Upvotes: 0
Reputation: 2066
You could keep your list sorted. If you insertion sort it as you add items, and the items to be added are almost sorted, then the insertions will still effectively run constant time. This would then allow you to binary search in logarithmic time.
Upvotes: 0
Reputation: 5357
Using a LinkedHashMap, which combines a double linked list which hash access, you should be able to interface with the TableModel as you are with an ArrayList but also access the entries via a hash lookup on TransactionID.
You can even replace (e.g. update) based on a key without affecting the iteration order.
Upvotes: 3
Reputation: 405715
You could keep the ArrayList sorted by searching for the insertion point as you add each TransactionEvent
. Collections.binarySearch returns
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(), if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
Once you search for the insertion point you can use the ArrayList add(int index, Object element) method instead of just adding to the end of the list as you would normally. This will slow down each insertion by a small factor, but it will enable you to use binary search for fast look-up.
Upvotes: 3