Reputation: 2775
Using Java, assuming v1.6.
I have a collection where the unique index is a string and the non-unique value is an int. I need to perform thousands of lookups against this collection as fast as possible.
I currently am using a HashMap<String, Integer>
but I am worried that the boxing/unboxing of the Integer to int is making this slower.
I had thought of using an ArrayList<String>
coupled with an int[]
.
i.e. Instead of:
int value = (int) HashMap<String, Integer>.get("key");
I could do
int value = int[ArrayList<String>.indexOf("key")];
Any thoughts? Is there a faster way to do this?
p.s. I will only build the collection once and maybe modify it once but each time I will know the size so I can use String[]
instead of ArrayList
but not sure there's a faster way to replicate indexOf...
Upvotes: 2
Views: 5316
Reputation: 3545
The map access does not do unboxing for the lookup, only the later access to the result makes it slow.
I suggest to introduce a small wrapper with a getter for the int, such as SimpleInt. It holds the int without conversion. The constructor is not expensive and overall is is cheaper than an Integer.
public SimpleInt
{
private final int data;
public SimpleInt(int i)
{
data = i;
}
// getter here
....
}
Upvotes: 0
Reputation: 49321
If the cost of building the map once and once only doesn't matter, you might want to look at perfect hashing, for example Bob Jenkins' code.
Upvotes: 1
Reputation: 1502016
Unboxing is fast - no allocations are required. Boxing is a potentially slower, as it needs to allocate a new object (unless it uses a pooled one).
Are you sure you've really got a problem though? Don't complicate your code until you've actually proved that this is a significant hit. I very much doubt that it is.
There are collection libraries available for primitive types, but I would stick to the normal HashMap from the JRE until you've profiled and checked that this is causing a problem. If it really only is thousands of lookups, I very much doubt it'll be a problem at all. Likewise if you're lookup-based rather than addition-based (i.e. you fetch more often than you add) then the boxing cost won't be particularly significant, just unboxing, which is cheap.
I would suggest using intValue()
rather than the cast to convert the value to an int
though - it makes it clearer (IMO) what's going on.
EDIT: To answer the question in the comment, HashMap.get(key)
will be faster than ArrayList.indexOf(key)
when the collection is large enough. If you've actually only got five items, the list may well be faster. I assume that's not really the case though.
If you really, really don't want the boxing/unboxing, try Trove (TObjectHashMap). There's also COLT to consider, but I couldn't find the right type in there.
Upvotes: 13
Reputation: 192005
Since you say it is indeed a bottleneck, I'll suggest Primitive Collections for Java; in particular, the ObjectKeyIntMap looks like exactly what you want.
Upvotes: 1
Reputation: 61536
Any performance gain that you get from not having to box/unbox is significanlty erased by the for loop that you need to go with the indexOf method.
Use the HashMap. Also you don't need the (int) cast, the compiler will take care of it for you.
The array thing would be ok with a small number of items in the array, but then so is the HashMap...
The only way you could make it fast to look up in an array (and this is not a real suggestion as it has too many issues) is if you use the hashCode of the String to work with as the index into the array - don't even think about doing that though! (I only mention it because you might find something via google that talks about it... if they don't explain why it is bad don't read any more about it!)
Upvotes: 3
Reputation: 147164
List.indexOf
will do a linear scan of the list - O(n) typically. A binary search will do the job in O(log n). A hash table will do it in O(1).
Having large numbers of Integer
objects in memory could be a problem. But then the same is true for String
s (both the String
and char[]
). You could do you own custom DB-style implementation, but I suggest benchmarking first.
Upvotes: 0
Reputation: 24272
I think scanning your ArrayList to find the match for your "key" is going to be much slower than your boxing/unboxing concerns.
Upvotes: 1
Reputation: 88806
One slight problem here: You can have duplicate elements in a List. If you really want to do it the second way, consider using a Set instead.
Having said that, have you done a performance test on the two to see if one is faster than the other?
Edit: Of course, the most popular Set type (HashSet) is itself backed by a HashMap, so switching to a set may not be such a wise change after all.
Upvotes: 0
Reputation: 75724
I would guess that the HashMap would give a much faster lookup, but I think this needs some benchmarking to answer correctly.
EDIT: Furthermore, There is no boxing involved, merely unboxing of the already-stored objects, which should be pretty fast, since no object allocation is done in that step. So, I don't think this would give you any more speed, but you should run benchmarks nonetheless.
Upvotes: 1