EurikaIam
EurikaIam

Reputation: 136

What is wrong with Map<Object, Collection<Object>>?

I'm trying to build a data structure in Java where I'll be inserting about 200,000 Keys of strings each with "an average" of 1000 Integers Map<String, Arraylist<Integer>>. The map will eventually have around 200 millions of values.

The problem is that While inserting, I have to first check if the key is existed in the map, if true, get all values stored in a temp collection then add the new integer to the collection and put them back to map, or instantiate a new collection with a new integer.

This is so slow when I get to the point where a collection contains around 50000 integers. I usually got a java out of heap space error.

Is there a way to get rid of the get process? where I only check for the key existence and then immediately add the value to the existed collection, something like posh to a stack, especially that the map is in the memory, or is it what makes the difference between Java and C++, where in C++ I can benefit from using pointers?

Keeping the fact that I dont prefer to increase the size of the map by using things like multimaps, as the structure seems almost straightforward.

Many thanks in advance.

Upvotes: 1

Views: 212

Answers (2)

Grzegorz Olszewski
Grzegorz Olszewski

Reputation: 1418

  1. There is a huge performance overhead related to the HashMap resize. When you create new HashMap with no-arg constructor, its size is by default 16. You put more and more elements in it, so any time you exceed available space, it needs to resize. Resizing involves calculating hash codes for every key and moving keys between hashtables. It's very expensive.

If you know your HashMap will store a lot of keys, you can create it with size of, for instance, 200,000.

  1. ArrayList has default capacity of ten. If you put more elements, it needs to resize. This involves creating new array (in which ArrayList internally stores elements) and copy elements from the old array to the new one. This also can be very expensive on large ArrayLists.

I would suggest using LinkedList instead. Adding new elements to it is really cheap, as elements are stored as independent nodes. However, there are some drawback. Please see this question for details.

  1. You have to be able to reserve enough memory for 200,000,000 objects. As Tom Hawtin suggested, increasing maximum heap space used by JVM may be necessary. Java is not C++, you can't just use more and more memory.

Upvotes: 2

Bob Kuhar
Bob Kuhar

Reputation: 11140

If your code is actually doing what your question suggests, you are working too hard. Once you've got your Key associated with the ArrayList. Just get the ArrayList out of the map and add the new integer to that list. You don't need to "put it back". The reference to the List is all you need to change the List.

    Map<String, ArrayList<Integer>> m = new HashMap<String, ArrayList<Integer>>();
    for ( int i = 0; i < 5; i++ ) {
        String key = ( i % 2 == 0 ) ? "Bob" : "Robert";
        ArrayList<Integer> l = m.get( key );
        if ( l == null ) {
            l = new ArrayList<Integer>();
            m.put( key, l );
        }
        l.add( i );
    }
    System.out.println( "m is " + m );

In my opinion, though, Guava Multimap is a far better solution to this problem: http://guava-libraries.googlecode.com/svn/tags/release03/javadoc/com/google/common/collect/Multimap.html

Upvotes: 5

Related Questions