xpages-noob
xpages-noob

Reputation: 1579

Flatten multidimensional HashMaps for better performance?

I frequently use multidimensional HashMaps, i.e. HashMaps containing HashMaps. On a two-key basis, for example, I set/get a stored value with

hashmapMulti.get(key1).put(key2,x);
hashmapMulti.get(key1).get(key2);

However, I could also use a "flat" hashmap and combine the two keys to set/get a value:

hashmapFlat.put(key1+"|"+key2,x);
hashmapFlat.get(key1+"|"+key2);

If I'm informed correctly, the time complexity for put and get should "more or less" be O(1) for HashMaps. With the flattening, I basically exchange the cost of a get (constant time) by the cost of combining 3 strings.

Which way is faster?

Does the best choice depend in the number of objects stored in the HashMap(s)?

Upvotes: 3

Views: 937

Answers (4)

weston
weston

Reputation: 54801

Which way is faster?

You would need to profile. I would rather do one look up by default (using biziclop's suggestion where I have more than one key) and I would only consider otherwise if there was a proven performance problem.

Does the best choice depend in the number of objects stored in the HashMap(s)?

Yes, but you can manage it so that one is as good as two for any number of objects:

HashMaps have a number of buckets. The hash value from the key, a 32 bit value is mapped to a much smaller range to select the bucket. This means that objects with different hashes can share buckets. When objects share buckets, performance falls as a linear search of the bucket takes place.

Worse case is a hash function that returns a constant number causing all keys to map to a single bucket, and best case is one that leads to an even distribution of key value pairs in buckets.

The number of buckets (HashMaps capacity) can be increased which combined with a good hash function can minimise the sharing of buckets.

Read this and pay attention to the advice on getting the capacity right: http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html

Upvotes: 2

Stephen C
Stephen C

Reputation: 719229

Which way is faster?

You need to benchmark it ... using a benchmark that closely mirrors what your real application will be doing. Your actual application running with real data would be the ideal benchmark.

The problem is that the problem has too many variables for a simple analysis to be plausible. Consider:

  • If you use two layers of nested maps, then each lookup involves two sets of hash calculations, array probes, and hash chain searches.

  • But on the flip side, using a combined key most likely will entail doing a string concatenation each time you want to do a lookup. In addition, if we assume that key strings used for the lookups are temporary, the String classes hashcode caching won't be effective.

Then there are the variables:

  • the ratio of lookups to other operations,
  • the total number of entries,
  • the number, and average lengths of the component key strings,
  • the degree to which the component (or combined) key strings are shared / reused,
  • the memory usage patterns of other parts of the application, and so on.

Finally there is the difficulty of apriori modelling of second order effects such the performance effects of memory caches, virtual memory and the garbage collector in the context of the application.


My advice would be to implement your complete application using one of the strategies, and then benchmark it (with real data) and profile it:

  • If benchmarking and profiling conclusively show that this part of your application is performance critical, then create a second version of the application using the alternative strategy.

  • Finally benchmark and profile the second version, and decide which one gives the best performance.

Upvotes: 1

Dan M
Dan M

Reputation: 325

It is faster the get(key).

As you know, working with string (especially concatenating) is, performance wise, EVIL, as, in the end, concatenating string is: - create a new object String - loop (cost: O(n)) on first string - loop (cost: O(n)) on second string

(and in your example, you do that 2x: 1 for get, 1 for put)

If the multidimensional hashmap fit in your design and represent correctly what are you modelling, I don't see any drawback in using it.

If you have a large number of object, the choiche of a two dimensional HashMap may add a little overhead to your memory footprint, but as I don't know your costraint (# of object and memory available) I can't say if you need to go for the flattening

Upvotes: 2

biziclop
biziclop

Reputation: 49794

A third option is to write a class that encapsulates the composite key.

It will have two fields for the two individual keys and if you override its equals() and hashCode() methods correctly, you won't have to depend on string concatenation.

Although performance-wise your best option is to write actual benchmarks and compare your implementations, this is definitely the cleanest solution: it's instantly readable and it avoids the rather brittle dependency on string concatenation (i.e. you can have keys that contain the | character).

Upvotes: 5

Related Questions