Reputation: 18605
I keep in mind that hash
would be first thing I should resort to if I want to write an application which requests high lookup speed, and any other data structure wouldn't guarantee that.
But I got confused when saw some many post saying different, such as suffix tree, trie, to name a few.
So I wonder is hash
always the best thing for high speed lookup? What if I want both high lookup speed and less space cost?
Is there any material (books or papers) lecturing about the data structures or algorithms **on high speed lookup and space efficiency? Any of this kind is highly appreciated.
Upvotes: 0
Views: 148
Reputation: 11562
I assume you are talking about strings here, and the answer is "no", hashes are not the fastest or most space efficient way to look up strings, tries are. Of course, writing a hashing algorithm is much, much easier than writing a trie.
One thing you won't find in wikipedia or books about tries is that if you naively implement them with one node per letter, you end up with large numbers of inefficient, one-child nodes. To make a trie that really burns up the CPU you have to implement nodes so that they can have a variable number of characters. This, of course, is even harder than writing a plain trie.
I have written trie implementations that handle over a billion entries and I can tell you that if done properly it is insanely fast, nothing else compares.
One other issue with tries is that you have to write a custom heap, because if you just use some kind of generic memory management it will be slow. So in addition to implementing the trie, you have to implement the heap that the trie runs on. Pretty freakin complicated, but if you do it, you get batshit crazy speed.
Upvotes: 1
Reputation: 11251
Hash tables are a good general purpose structure but they can fail spectacularly if the hash function doesn't suit the input data. Worst case lookup is O(n). They also waste some space as you mentioned. Other general-purpose structures like balanced binary search trees have worse average case but better worst case performance than a hash table. This is important for real-time applications. A trie is a more special-purpose structure tailored to string lookup.
Upvotes: 0
Reputation: 178521
So I wonder is hash always the best thing for high speed lookup?
No. As stated in comments:
There is never such a thing Best data structure for [some generic issue]. Everything is case dependent. Tries and radix trees might be great for strings, since you need to read the string anyway. arrays allows simplicity and great cache efficiency - and are usually the best for small scale static information
I once answered a related question of cases where a tree might be better then a hash table: Hash Table v/s Trees
What if I want both high lookup speed and less space cost?
The two might be self-contradicting. Even for the simple example of a hash table of size X
vs a hash table of size 2*X
. The bigger hash table is less likely to encounter collisions, and thus is expected to be faster then the smaller one.
Is there any material (books or papers) lecturing about the data structures or algorithms on high speed lookup and space efficiency?
Introduction to Algorithms provide a good walk through on the main data structure used. Any algorithm developed is trying to provide a good space and time efficiency, but like said, there is a trade off, and some algorithms might be better for specific cases then others.
Choosing the right algorithm/data structure/design for the specific problem is what engineering is about, isn't it?
Upvotes: 3
Reputation: 3031
That might also depend on the actual number of elements. In complexity theory a hash is not bad, but complexity theory is only good if the actual number of elements is bigger than some threshold.
I.e. if you have only 2 elements, there is a faster method than a hash ;-)
Upvotes: 0
Reputation: 6616
Only a good implementation of hash will give you good performance. And you cannot compare hash with Trie for all situations. Situations where Trie is applicable, is fast, but it can be costly in terms of memory, (again dependent on implementation).
But have you measured performance? Or it is unnecessary optimization you are looking for. Did the map fail you?
Upvotes: 1