monogate
monogate

Reputation: 1439

Why is accessing an element of a dictionary by key O(1) even though the hash function may not be O(1)?

I see how you can access your collection by key. However, the hash function itself has a lot of operations behind the scenes, doesn't it?

Assuming you have a nice hash function which is very efficient, it still may take many operations.

Can this be explained?

Upvotes: 82

Views: 36032

Answers (9)

varun_k_21
varun_k_21

Reputation: 43

We know that hash function take O(1) to access value by key...so it doesn't mean that it will take only 1 step to get the value, It means constant time "t" where that "t" does not depend on size of your data structure(eg:-python dict()).

Upvotes: 0

Ezra
Ezra

Reputation: 49

Please see post What does "O(1) access time" mean?

The number of operations in a hash function is irrelevant as long as it takes the same (constant) amount of time for EVERY element in the collection. For example, accessing one element in a collection of 2 elements takes .001 ms, but also accessing one element in a collection of 2,000,000,000 elements takes .001 ms. Although the hash function can contain hundreds of if statements and multiple calculations.

Upvotes: 3

Martin Carpella
Martin Carpella

Reputation: 12603

If a dictionary/map is implemented as a HashMap, it has a best case complexity of O(1), since i best case it requires exactly the calculation of the hash-code of the key element for retrieval, if there are no key collisions.

A hash-map may have a worst-case runtime complexity of O(n) if you have a lot of key collisions or a very bad hash function, since in this case it degrades to a linear scan of the entire array which holds the data.

Also, O(1) doesn't mean instantly, it means it has a constant amount. So choosing the right implementation for a dictionary may as well depend on the number of elements in the collection, since having a very high constant cost for the function will be much worse if there are only a few entries.

That's why dictionaryies/maps are implemented differently for different scenarios. For Java there are multiple different implementations, C++ uses red/black-trees, etc. You chose them based on the number of data and based on their best/average/worst-case runtime-efficiency.

Upvotes: 13

VidasV
VidasV

Reputation: 4895

It means that no matter what size your collection can be, it will still take almost the same amount of time to retrieve any of its members.

So in other words Dictionary with 5 members will let's say coud take around 0.002 ms to access one of them, as well as dictionary of 25 members should take something similar. Big O means algorithmic complexity over collection size instead of actual statements or functions executed

Upvotes: 16

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726987

the HashFunc itself has a lot of operations behind the scenes

That is certainly true. However, the number of these operations depends on the size of the key, not on the size of the hash table into which the key is inserted: the number of operations to compute hash function is the same for a key in a table with ten or with ten thousand entries.

That is why the call of hash function is often considered O(1). This works fine for fixed-size keys (integral values and fixed-length strings). It also provides a decent approximation for variable-sized keys with a practical upper limit.

Generally, though, access time of a hash table is O(k), where k is the upper limit on the size of the hash key.

Upvotes: 123

Ed Avis
Ed Avis

Reputation: 1502

Once you allow for the fact that larger and larger dictionaries take up more memory, going further down the cache hierarchy and eventually out to slow swap space on disk, it's hard to argue that it is truly O(1). The performance of the dictionary will get slower as it gets bigger, probably giving O(log N) time complexity. Don't believe me? Try it for yourself with 1, 100, 1000, 10000 and so on dictionary elements, up to say 100 billion, and measure how long it takes in practice to look up an element.

However if you make the simplifying assumption that all memory in your system is random access memory, and can be accessed in constant time, then you can claim that the dictionary is O(1). This assumption is common, even though it's not really true for any machine with disk swap space, and still pretty debatable in any case given the various levels of CPU cache.

Upvotes: 1

twihoX
twihoX

Reputation: 189

Theoretically it is still O(n), because in the worst case all your data may end up having identical hash and be bundled together in which case you have to linearly go through all of it.

Upvotes: 7

Paarth
Paarth

Reputation: 10397

O(1) doesn't mean instant. O(1) means constant without regard to the size of the data. The hash function takes a certain amount of time, but that amount of time doesn't scale with the size of the collection.

Upvotes: 148

JeReT
JeReT

Reputation: 161

from the docs:

Retrieving a value by using its key is very fast, close to O(1), because the T:System.Collections.Generic.Dictionary`2 class is implemented as a hash table.

So it can be O(1) but might be slower. Here you can find another thread regarding hashtable performance: Hash table - why is it faster than arrays?

Upvotes: 1

Related Questions