Reputation: 39324
I'm reading a textbook and it's talking about hash-list implementation. With regard to the hash table specifically, the text book says:
The chaining method works reasonably well if the elements are evenly spread among the array positions, a situation called uniform hashing. For example, if we have 300 employees and an array size of 100, and if there are about 3 employees per position, give or take an employee, then we still have a search function that operates in O(1) time, since no more than 3 or 4 comparisons will be needed to find the right employee.
This is assuming we have an array (for the hash table) of 100 elements, each of which is a linked-list used as a collision list for that element.
So, my question is:
This paragraph states that given our hashing algorithm, we can search for an element in O(1) time. This surprises me though, because the bigger your data set gets, the more collisions you'll have, and the bigger your collision lists will get. So, the collision lists will grow slowly with (n = # employees), but they'll grow.
I would have thought this made the algorithm act in O(n) time.
Are hash tables analyzed differently based on the hash function and expected data set size? Most algorithmic analyses don't seem to include a specified data set size, so it surprises and confuses me that hash-table analyses include a limited size of (n) in this case.
Upvotes: 3
Views: 332
Reputation: 44250
The unit of work is one element. Looking up one element in a hashtable of N items in M slots is a O(1) process.
creating the hash table is of course (at least) an O(N) process. Looking up all elements is also (at least) a O(N) process.
The same logic goes for binary search or trees: the O(some_function_of_N) is the amount of work needed to look up one element (given an array or tree of size N).
Upvotes: -1
Reputation: 437664
The important detail that is not directly mentioned here is that hash tables are assumed to resize themselves if their load factor exceeds some threshold.
Your thoughts are correct at first sight, but they make the assumption that the load factor will be allowed to grow indefinitely (so that the collission lists will slowly grow longer without any upper bound).
If the hash table is resized instead to keep the load factor under a constant number L, then by definition there will be at most L operations when searching a collission list. Since L is unrelated to N (the number of the items in the table), a search would still be constant time.
Upvotes: 4
Reputation: 36476
In a typical implementation, after the load on the hash table exceeds some bound (the current Java implementation uses 0.75), the table will dynamically resize itself.
This ensures that there are "enough" keys for an O(1) average performance.
Upvotes: 3