Reputation: 10768
I need to repeatedly (hundred of thousands of times) retrieve an element (different each time) from a Collection
which contains dozens of thousand of Objects.
What is the quickest way to do this retrieval operation? At the moment my Collection
is a List
and I iterate on it until I have found the element, but is there a quicker way? Using a Map
maybe? I was thinking to do:
Map
, with the key being the id field of the Object, and the Object itself being the value.get(id)
on the Map
should be much faster than looping through a List
. HashMap
or TreeMap
? - my objects have no particular ordering.Any advice on the matter would be appreciated!
Last note: if an external library provides a tool to answer this I'd take it gladly!
Upvotes: 4
Views: 2349
Reputation: 22241
I've created code which tests the proformance of BinarySearch, TreeMap and HashMap for the given problem.
In case you are rebuilding the collection each time, HashMap is the fastest (even with standard Object's hashCode() implementation!), sort+array binary search goes second and a TreeMap is last (due to complex rebuilding procedure).
proc array: 2395
proc tree : 4413
proc hash : 1325
If you are not rebuilding the collection, HashMap is still the fastest, an array's binary search is second and a TreeMap is the slowest, but with only slightly lower speed than array.
proc array: 506
proc tree : 561
proc hash : 122
Test code:
public class SearchSpeedTest {
private List<DataObject> data;
private List<Long> ids;
private Map<Long, DataObject> hashMap;
private Map<Long, DataObject> treeMap;
private int numRep;
private int dataAmount;
private boolean rebuildEachTime;
static class DataObject implements Comparable<DataObject>{
Long id;
public DataObject(Long id) {
super();
this.id = id;
}
public DataObject() {
// TODO Auto-generated constructor stub
}
@Override
public final int compareTo(DataObject o) {
return Long.compare(id, o.id);
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public void dummyCode() {
}
}
@FunctionalInterface
public interface Procedure {
void execute();
}
public void testSpeeds() {
rebuildEachTime = true;
numRep = 100;
dataAmount = 60_000;
data = new ArrayList<>(dataAmount);
ids = new ArrayList<>(dataAmount);
Random gen = new Random();
for (int i=0; i< dataAmount; i++) {
long id = i*7+gen.nextInt(7);
ids.add(id);
data.add(new DataObject(id));
}
Collections.sort(data);
treeMap = new TreeMap<Long, DataObject>();
populateMap(treeMap);
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
Procedure[] procedures = new Procedure[] {this::testArray, this::testTreeMap, this::testHashMap};
String[] names = new String[] {"array", "tree ", "hash "};
for (int n=0; n<procedures.length; n++) {
Procedure proc = procedures[n];
long startTime = System.nanoTime();
for (int i=0; i<numRep; i++) {
if (rebuildEachTime) {
Collections.shuffle(data);
}
proc.execute();
}
long endTime = System.nanoTime();
long diff = endTime - startTime;
System.out.println("proc "+names[n]+":\t"+(diff/1_000_000));
}
}
void testHashMap() {
if (rebuildEachTime) {
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
}
testMap(hashMap);
}
void testTreeMap() {
if (rebuildEachTime) {
treeMap = new TreeMap<Long, SearchSpeedTest.DataObject>();
populateMap(treeMap);
}
testMap(treeMap);
}
void testMap(Map<Long, DataObject> map) {
for (Long id: ids) {
DataObject ret = map.get(id);
ret.dummyCode();
}
}
void populateMap(Map<Long, DataObject> map) {
for (DataObject dataObj : data) {
map.put(dataObj.getId(), dataObj);
}
}
void testArray() {
if (rebuildEachTime) {
Collections.sort(data);
}
DataObject key = new DataObject();
for (Long id: ids) {
key.setId(id);
DataObject ret = data.get(Collections.binarySearch(data, key));
ret.dummyCode();
}
}
public static void main(String[] args) {
new SearchSpeedTest().testSpeeds();
}
}
Upvotes: 3
Reputation: 7496
You can use a map if you have a good way to define the key of the map. In the worst case you can use your object as key and value.
As ordering is not important use a HashMap
. To maintain the order in a TreeMap
there is additional cost when adding an element, as it must be added at the correct position.
Upvotes: 0
Reputation: 1859
HashMap will be more efficient in general, so use it whenever you don't care about the order of the keys.
when you want to keep your entries in Map sorted by key than use TreeMap but sorting will be overhead in your case as you dont want any particulate order.
Upvotes: 2
Reputation: 52185
As per the documentation of the Tree Map
(emphasis my own):
The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
In your case, you state that the items have no particular order and it does not seem that you are after any particular order, but rather just be able to retrieve data as fast as possible.
HashMaps
provide constant read time but do not guarantee order, so I think that you should go with HashMaps
:
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time. This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
As a side note, the memory footprint of this can get quite high quite fast, so it might also be a good idea to look into a database approach and maybe use a cache like mechanism to handle more frequently used information.
Upvotes: 5