peterk
peterk

Reputation: 5444

What is the relative memory overhead per mapped value of the different Java map types

I have a case where I have "property maps" one each attached to a very large collection of objects. I am only interested in the overhead of the map itself not of the attached keys and values.

Many have a small number of properties ie: < 5, and many none. For empty maps I use a singleton "empty instance" as an optimization.

For speed it seems that at a count of <= 5, TreeMap seems equal or better than HashMap for retrieval and "auto trims" when nodes are removed. I would think a HashMap uses for each entry the memory overhead of the hash/handle array including memory above the load factor full of null entry handles, and the object overhead and handle storage for the key and the value in the "Entry[+internal fields if any]".

Also I would think a TreeMap uses only the per Entry Object overhead + key,value,prev,next handle fields in the entry for the list, and the smallest possible a binary search of an array - One array object (trimmed to size) full of just the handles for the keys and the values.

Has anyone really checked the accurate full overhead of each of the Java map types? For my bazillions of objects with properties it would make a significant difference in the 1 to 5 property case without impacting speed of access.

Upvotes: 2

Views: 506

Answers (4)

peterk
peterk

Reputation: 5444

Ok - I looked at the Java source and yes TreeMap does use a fair amount of memory per node. So I made two map implementations and tested them.

One a simple singly linked list, the parent with only two fields MyMap { Entry root; int size; } The Entry { key, value, next } The performance turns out for sizes <= 8 was faster than hash map and easy to code and the smaller sizes linearly faster.

I also did the interleaved key/value array with Binary search which performed worse than the simple list as the extra overhead of all the invoking of Comparable.compare() method on the keys ate all the gains and it preformed no faster than HashMap at smaller sizes and got much worse as it got bigger.

So for my case the simplest thing wins. I simply replace it with a hash map when it gets bigger than 8 and if the HashMap gets below a size of 5 I revert back to the list map.

And you are right for putting. For this application there is probably only one put for several hundred gets and most of the puts just replace the value in the entry and do not trash it.

Upvotes: 0

maaartinus
maaartinus

Reputation: 46482

Assuming you want to optimize really hard and need a full fledged Map, you can encapsulate an Object[] in an own AbstractMap and put there all keys and values interleaved. You need to implement a single method for which you use an AbstractSet, but then it gets worse as all the mutating methods throw by default. So you'd have to inspect them all... which may take maybe a whole day.

With Guava-testing you could ensure that your map will really works. Without really profound testing, I'd recommend to stick with your own tiny interface as Map is pretty huge.

Upvotes: 2

leventov
leventov

Reputation: 15313

If your property maps are not updated actively, but built once and queried for more or less long time, I would propose a Java-specific memory saving architecture:

public interface DynamicMap<K, V> {
    DynamicMap<K, V> add(K key, V value);
    V get(K key);
}

class WrappingDynamicMap<K, V> implements DynamicMap<K, V> {
    DynamicMap<K, V> delegate;
    public void put(K key, V value) { delegate = add(key, value); }

    @Override public DynamicMap<K, V> add(K key, V value) {
        if (delegate == null) return new Map1<>(key, value);
        return delegate.add(key, value);
    }

    @Override public V get(K key) { return delegate.get(key); }
}

class Map1<K, V> implements DynamicMap<K, V> {
    K k1; V v1;
    Map1(K k1, V v1) { this.k1 = k1; this.v1 = v1; }

    boolean putThis(K key, V value) {
        if (key.equals(k1)) { v1 = value; return true; }
        return false;
    }

    @Override public DynamicMap<K, V> add(K key, V value) {
        if (putThis(key, value)) return this;
        return new Map2<>(this, key, value);
    }

    @Override public V get(K key) { return key.equals(k1) ? v1 : null; }
}

class Map2<K, V> extends Map1<K, V> {
    K k2; V v2;
    Map2(Map1<K, V> m, K k2, V v2) {
        super(m.k1, m.v1); this.k2 = k2; this.v2 = v2;
    }

    @Override boolean putThis(K key, V value) {
        if (super.putThis(key, value)) return true;
        if (key.equals(k2)) { v2 = value; return true; }
        return false;
    }

    @Override public DynamicMap<K, V> add(K key, V value) {
        if (putThis(key, value)) return this;
        HashMap<K,V>hm=new HashMap<K,V>(){{
            put(k1,v1);put(k2,v2);put(key,value);}};
        return new BigMap<>(hm);
    }

    @Override public V get(K key) {
        V v = super.get(key);
        return v != null ? v : (key.equals(k2) ? v2 : null);
    }
}

// Map3, Map4, Map5

class BigMap<K, V> implements DynamicMap<K, V> {
    private HashMap<K, V> impl;
    BigMap(HashMap<K, V> impl) { this.impl = impl; }

    @Override public DynamicMap<K, V> add(K key, V value) {
        impl.put(key, value); return this;
    }

    @Override public V get(K key) { return impl.get(key); }
}

If you need classic Map interface, use WrappingDynamicMap with put(), if you can afford to reassign property map field within the object on each map put, you can use DynamicMap implementations directly with add(), it would safe 1 dereference on each query and 16-24 bytes on heap for each property map.

Advantages of this approach:

  • I bet this would be much faster than HashMap/TreeMap on sizes 1..3-5 depending on your key's equals() complexity
  • Use absolute minimum memory to hold the data
  • Could be easily specialized for primitive keys/values

Disadvantages:

  • GC pressure on map puts/removes

I hope it's clear how removes could be implemented for Map1/Map2/../BigMap. BigMap could also monitor impl.size() and turn back to Map5 eventually.

Additional opportunities:

  • For size between 5 and 12 very-very simple open-addressing hash table implementation (with constant modulo 16) could be used instead of HashMap, with size stored as byte.

Upvotes: 2

Louis Wasserman
Louis Wasserman

Reputation: 198371

https://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures

TreeMap and HashMap use approximately the same amount of memory per entry, though it's true that TreeMap has a slightly smaller constant overhead for the map itself.

Upvotes: 3

Related Questions