Ulrich Scholz
Ulrich Scholz

Reputation: 2111

How to save space with maps

I have to store large amounts of data in maps and the total size is critical. The number of maps is high, the size of each individual map is small (<10 mappings for most of them) and the maps do not change after creation.

I see two ways (let's assume I know that n mappings will be stored):

  1. Use a HashMap of initial size n and load factor 1
  2. Use an ArrayList of size n, store (Key, Value) pairs. Implement a get() method just as with Map

Is there a better way (maybe Guava ImmutableMap)?

Upvotes: 2

Views: 1344

Answers (3)

Joop Eggen
Joop Eggen

Reputation: 109613

See Perfect hash function

For a map where no longer keys are added, one could go for an optimized hash function: an array as small as feasible, and collisions with minimal influence.

Besides academical treatises out there, a hash function can be build from n different smaller functions/value entities, and an optimal can be found by trying a combination on the data set. And with varying array sizes.

As this area is too broad (like rehashing), search further, or do it yourself.


If you have gotten many values, taking the same Object instance instead of having many different Objects that are equal. This is done with an identity map Map<T, T> using only the first put key.


  • A TreeMap is for large data as inefficient as a LinkedList w.r.t. ArrayList.
  • The implementation of HashMap is quite interesting for efficiency. At the end one could do the same as ArrayList.trimToSize() though probably irrelevant.

Upvotes: 1

DNA
DNA

Reputation: 42617

There are many ways to do this, and it's difficult to predict the space requirement exactly (depends on e.g. Java object overheads and packing, and the architecture you are on!). You may need to do some memory benchmarking with different approaches, using actual (or representative) data.

One way is to use pairs of arrays (one for keys, one for values).

Another idea is to use a single outer Map, but combine a key to address each inner map with the key for each inner value. This way, you avoid the overhead of lots of small Maps.

So for our first Map of:

"one" -> 1
"two" -> 2

and a second Map of

"three" -> 3

we store all the entries in a single Map, for example:

"1-one" -> 1
"1-two" -> 2
"2-three" -> 3

(You can probably use a similar idea using one large array or ArrayList, storing pairs of values, if you can store the pairs in a sorted manner so that you can find them efficiently using binary search). Or a pair of arrays/ArrayLists so you don't need to wrap key/value into a Pair object.

Upvotes: 0

Michael Borgwardt
Michael Borgwardt

Reputation: 346536

If space is your primary concern, then using ArrayList or even plain arrays would be best. You'll have to test whether the serial lookup causes significant performance degradation; most likely it won't. If your data is Comparable, you could use a binary search, but I doubt whether that would actually help at such a small size.

One thing to be concerned about is whether you could sometimes have significantly larger maps; it might be a good idea to add a check and use a regular HashMap in that case.

Upvotes: 0

Related Questions