sub
sub

Reputation: 2733

C++ STL Map vs Vector speed

In the interpreter for my experimental programming language I have a symbol table. Each symbol consists of a name and a value (the value can be e.g.: of type string, int, function, etc.).

At first I represented the table with a vector and iterated through the symbols checking if the given symbol name fitted.

Then I though using a map, in my case map<string,symbol>, would be better than iterating through the vector all the time but:

It's a bit hard to explain this part but I'll try.

If a variable is retrieved the first time in a program in my language, of course its position in the symbol table has to be found (using vector now). If I would iterate through the vector every time the line gets executed (think of a loop), it would be terribly slow (as it currently is, nearly as slow as microsoft's batch).

So I could use a map to retrieve the variable: SymbolTable[ myVar.Name ]

But think of the following: If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it. That means: The next time it is needed, my interpreter knows that it has been "cached" and doesn't search the symbol table for it but does something like SymbolTable.at( myVar.CachedPosition ).

Now my (rather hard?) question:

Upvotes: 21

Views: 43936

Answers (12)

Matthieu M.
Matthieu M.

Reputation: 299730

You effectively have a number of alternatives.

Libraries exist:

Critics

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N) performance on find, is it acceptable ?
  • Why not using a unordered_map ? They provide O(1) lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tables you'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.

Upvotes: 15

Puppy
Puppy

Reputation: 146910

A map will scale much better, which will be an important feature. However, don't forget that when using a map, you can (unlike a vector) take pointers and references. In this case, you could easily "cache" variables with a map just as validly as a vector. A map is almost certainly the right choice here.

Upvotes: 0

Nick Dandoulakis
Nick Dandoulakis

Reputation: 43110

For looking up values, by a string key, map data type is the appropriate one, as mentioned by other users.

STL map implementations usually are implemented with self-balancing trees, like the red black tree data structure, and their operations take O(logn) time.

My advice is to wrap the table manipulation code in functions,
like table_has(name), table_put(name) and table_get(name).

That way you can change the inner symbol table representation easily if you experience
slow run time performance, plus you can embed in those routines cache functionality later.

Upvotes: 0

baol
baol

Reputation: 4358

You say: "If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it.".

You can do the same with the map: search the variable using find and store the iterator pointing to it instead of the position.

Upvotes: 0

peterchen
peterchen

Reputation: 41096

a std::map (O(log(n))) or a hashtable ("amortized" O(1)) would be the first choice - use custom mechanisms if you determin it's a bottleneck. Generally, using a hash or tokenizing the input is the first optimization.

Before you have profiled it, it's most important that you isolate lookup, so you can easily replace and profile it.


std::map is likely a tad slower for a small number of elements (but then, it doesn't really matter).

Upvotes: 1

Michael Burr
Michael Burr

Reputation: 340168

If you're going to use a vector and go to the trouble of caching the most recent symbol look up result, you could do the same (cache the most recent look-up result) if your symbol table were implemented as a map (but there probably wouldn't be a whole lot of benefit to the cache in the case of using a map). With a map you'd have the additional advantage that any non-cached symbol look ups would be much more performant than searching in a vector (assuming that the vector isn't sorted - and keeping a vector sorted can be expensive if you have to do the sort more than once).

Take Neil's advice; map is generally a good data structure for a symbol table, but you need to make sure you're using it correctly (and not adding symbols accidentally).

Upvotes: 0

Klaim
Klaim

Reputation: 69672

Map's operator [] is O(log(n)), see wikipedia : http://en.wikipedia.org/wiki/Map_(C%2B%2B)

I think as you're looking often for symbols, using a map is certainly right. Maybe a hash map (std::unordered_map) could make your performance better.

Upvotes: 0

Dietrich Epp
Dietrich Epp

Reputation: 213258

When most interpreters interpret code, they compile it into an intermediate language first. These intermediate languages often refer to variables by index or by pointer, instead of by name.

For example, Python (the C implementation) changes local variables into references by index, but global variables and class variables get referenced by name using a hash table.

I suggest looking at an introductory text on compilers.

Upvotes: 2

Daniel Earwicker
Daniel Earwicker

Reputation: 116654

Map is O(log N), so not as fast as positional lookup in an array. But the exact results will depend on a lot of factors, and so the best approach is to interface with the container in a way that allows you to swap between implementation later on. That is, write a "lookup" function that can be efficiently implemented by any suitable container, to allow yourself to switch and compare speeds of different implementation.

Upvotes: 0

anon
anon

Reputation:

A map is a good thing to use for a symbol table. but operator[] for maps is not. In general, unless you are writing some trivial code, you should use the map's member functions insert() and find() instead of operator[]. The semantics of operator[] are somewhat complicated, and almost certainly don't do what you want if the symbol you are looking for is not in the map.

As for the choice between map and unordered_map, the difference in performance is highly unlikely to be significant when implementing a simple interpretive language. If you use map, you are guaranteed it will be supported by all current Standard C++ implementations.

Upvotes: 17

Tronic
Tronic

Reputation: 10430

std::map's operator[] takes O(log(n)) time. This means that it is quite efficient, but you still should avoid doing the lookups over and over again. Instead of storing an index, perhaps you can store a reference to the value, or an iterator to the container? This avoids having to do lookup entirely.

Upvotes: 2

Mike Dinsdale
Mike Dinsdale

Reputation: 1511

Normally you'd use a symbol table to look up the variable given its name as it appears in the source. In this case, you only have the name to work with, so there's nowhere to store the cached position of the variable in the symbol table. So I'd say a map is a good choice. The [] operator takes time proportional to the log of the number of elements in the map - if it turns out to be slow, you could use a hash map like std::tr1::unordered_map.

Upvotes: 4

Related Questions