deek0146
deek0146

Reputation: 984

Is it more efficient to have a map of maps, or a very large map?

I am storing some objects in map (hashed by strings), but the objects are categoriseable by another string. I could therefore make a map of these categories, and for each category keep another map of the objects in that category.

I will always know the category whenever I make an insertion or a get request from this data structure. Is this more efficient? It seems like it would be, except the lookup time of a map is log(n) I beleive, so what would the overall benefits be?

Upvotes: 2

Views: 1125

Answers (3)

nsanders
nsanders

Reputation: 12626

Just test what's faster using real data.

Statements like "lookup time of a map is log(n)" can be misleading. There's still an arbitrary asymptotic constant at play. Plus if you have data or data accesses distributed in a non uniform-random manner, as is often the case, statements about "best" get even more complicated.

When it comes to performance, few things work better than actual measurements using real data.

Upvotes: 5

Bowie Owens
Bowie Owens

Reputation: 2976

"Is this more efficient? It seems like it would be"

My gut says that in the average case it will be less efficient for two reasons.

First, you are adding more keys. You have n1 objects you need to search. You also have n2 categories the objects are divided into. You now have n1 + n2 keys instead of n1.

Second, std::map is usually implemented as a balanced binary tree. The balanced part is critical to ensure that the worst case lookup time is O(log n) rather than O(n). By moving to a two layer structure you may be preventing the tree from being balanced. If your access to objects is uniform random, an unbalanced tree will have worse performance than a balanced one.

To illustrate a perfectly balanced tree follows. Worst case 3 comparisons.

       d
     /   \
   b       f
  / \     / \
 a   c   e   g

An unbalanced tree with the same data. Worst case 4 comparisons.

    b 
  /   \
a       e
       / \
      d   f
     /     \
    c       g

That said, an unbalanced tree can be faster if what you are searching for is near the top of the tree. Which is why you should listen to @nsanders when he says to profile.

Upvotes: 3

David Nehme
David Nehme

Reputation: 21572

Doing the arithmetic, if you have n elements lookup time would be O(log(n)). If you split the map into m1 maps containing maps of size m2, then you need O(log(m1)) to lookup in the first map, then O(log(m2)) to do a lookup on the second map. But since

log(m1) + log(m2) = log(m1*m2) = log(n)

You aren't buying much either way and you should probably code it whatever way best reveals your intentions.

As far as the actual performance, you should profile it and see if it makes a difference. It's possible that the two maps will be faster since each map will have a simpler comparator functions (they will each do one string comparison while the comparator for the monolithic map will have to do two for some pairs objects). Also, if in your application you are likely to be not finding a match on the first string for a lot of your lookups then you will doing just first lookup which might save some time.

On the other hand, if the sizes of the secondary maps are not uniform, then you could be effectively be doing two lookups. As a worst-case consider that half the keys have identical first strings, and the other keys all have district strings.

Upvotes: 2

Related Questions