Adrian Hope-Bailie
Adrian Hope-Bailie

Reputation: 2775

String indexed collection in Java

Using Java, assuming v1.6.

I have a collection where the unique index is a string and the non-unique value is an int. I need to perform thousands of lookups against this collection as fast as possible.

I currently am using a HashMap<String, Integer> but I am worried that the boxing/unboxing of the Integer to int is making this slower.

I had thought of using an ArrayList<String> coupled with an int[].

i.e. Instead of:

int value = (int) HashMap<String, Integer>.get("key");

I could do

int value = int[ArrayList<String>.indexOf("key")];

Any thoughts? Is there a faster way to do this?

p.s. I will only build the collection once and maybe modify it once but each time I will know the size so I can use String[] instead of ArrayList but not sure there's a faster way to replicate indexOf...

Upvotes: 2

Views: 5316

Answers (9)

ReneS
ReneS

Reputation: 3545

The map access does not do unboxing for the lookup, only the later access to the result makes it slow.

I suggest to introduce a small wrapper with a getter for the int, such as SimpleInt. It holds the int without conversion. The constructor is not expensive and overall is is cheaper than an Integer.

public SimpleInt
{
    private final int data;

    public SimpleInt(int i)
    {
        data = i;
    }

    // getter here
    ....
}

Upvotes: 0

Pete Kirkham
Pete Kirkham

Reputation: 49321

If the cost of building the map once and once only doesn't matter, you might want to look at perfect hashing, for example Bob Jenkins' code.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1502016

Unboxing is fast - no allocations are required. Boxing is a potentially slower, as it needs to allocate a new object (unless it uses a pooled one).

Are you sure you've really got a problem though? Don't complicate your code until you've actually proved that this is a significant hit. I very much doubt that it is.

There are collection libraries available for primitive types, but I would stick to the normal HashMap from the JRE until you've profiled and checked that this is causing a problem. If it really only is thousands of lookups, I very much doubt it'll be a problem at all. Likewise if you're lookup-based rather than addition-based (i.e. you fetch more often than you add) then the boxing cost won't be particularly significant, just unboxing, which is cheap.

I would suggest using intValue() rather than the cast to convert the value to an int though - it makes it clearer (IMO) what's going on.

EDIT: To answer the question in the comment, HashMap.get(key) will be faster than ArrayList.indexOf(key) when the collection is large enough. If you've actually only got five items, the list may well be faster. I assume that's not really the case though.

If you really, really don't want the boxing/unboxing, try Trove (TObjectHashMap). There's also COLT to consider, but I couldn't find the right type in there.

Upvotes: 13

Michael Myers
Michael Myers

Reputation: 192005

Since you say it is indeed a bottleneck, I'll suggest Primitive Collections for Java; in particular, the ObjectKeyIntMap looks like exactly what you want.

Upvotes: 1

TofuBeer
TofuBeer

Reputation: 61536

Any performance gain that you get from not having to box/unbox is significanlty erased by the for loop that you need to go with the indexOf method.

Use the HashMap. Also you don't need the (int) cast, the compiler will take care of it for you.

The array thing would be ok with a small number of items in the array, but then so is the HashMap...

The only way you could make it fast to look up in an array (and this is not a real suggestion as it has too many issues) is if you use the hashCode of the String to work with as the index into the array - don't even think about doing that though! (I only mention it because you might find something via google that talks about it... if they don't explain why it is bad don't read any more about it!)

Upvotes: 3

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147164

List.indexOf will do a linear scan of the list - O(n) typically. A binary search will do the job in O(log n). A hash table will do it in O(1).

Having large numbers of Integer objects in memory could be a problem. But then the same is true for Strings (both the String and char[]). You could do you own custom DB-style implementation, but I suggest benchmarking first.

Upvotes: 0

Robin
Robin

Reputation: 24272

I think scanning your ArrayList to find the match for your "key" is going to be much slower than your boxing/unboxing concerns.

Upvotes: 1

Powerlord
Powerlord

Reputation: 88806

One slight problem here: You can have duplicate elements in a List. If you really want to do it the second way, consider using a Set instead.

Having said that, have you done a performance test on the two to see if one is faster than the other?

Edit: Of course, the most popular Set type (HashSet) is itself backed by a HashMap, so switching to a set may not be such a wise change after all.

Upvotes: 0

soulmerge
soulmerge

Reputation: 75724

I would guess that the HashMap would give a much faster lookup, but I think this needs some benchmarking to answer correctly.

EDIT: Furthermore, There is no boxing involved, merely unboxing of the already-stored objects, which should be pretty fast, since no object allocation is done in that step. So, I don't think this would give you any more speed, but you should run benchmarks nonetheless.

Upvotes: 1

Related Questions