Reputation: 4333
I have a question about how python handle data in dictionaries. Lets say I have a simple dictionary with a number as the key and a number as the value as shown below:
a = { 5: 3, 20: 1, 1: 1, 5: 2, 100: 3, 11: 6,
14: 1, 15: 2, 16: 4, 17: 2, 25: 1, 19: 1 }
I want to iterate through this dictionary and print out the keys. Every time I loop through the dictionary (as shown below) it prints the keys in increasing order.
This is what I want it to do, but I was wondering, for my own knowledge, why does this happen? Does it auto sort it in increasing order every time? As you can see in the dictionary above, the keys are clearly not in increasing order but the output below prints them in increasing order.
I'm just trying to gain a clear understanding, any help would be greatly appreciated. Thanks
for i in a:
print i
Output:
1
5
11
14
15
16
17
19
20
25
100
Upvotes: 2
Views: 1523
Reputation: 31484
Integers in a dictionary are not always ordered by the key:
a = {2:0, 9:0}
print a.keys() # [9, 2]
Python dictionaries are Hash Tables, which are a special kind of array, where the index of the cell where you store the value is derived applying a special function (let's call it the hash
function) on the key.
This way if you want to retrieve the value for a particular key you can compute again the hash
function of the key, which will return the same result as before, obtaining the index where the value is stored.
The hash
function converts most types of data to an integer:
print hash(1) # 1
print hash('hello') # 840651671246116861
print hash((2,3)) # 3713082714463740756
Each type can define its own way to compute the hash and int
usually returns itself:
print hash(1) # 1
print hash(20) # 20
print hash(1000) # 1000
As you can see numbers get big soon, and we don't want to have an array with 840651671246116861 cells just to save the string hello
.
To avoid the problem we can create an array with n
elements and then use the remainder of the hash divided by n
as the index.
For example if we want to find the index for hello
in an array of 8 elements:
print hash('hello') % 8 # 5
So our dictionary will know that the value for the key hello
is at index 8. That's how dictionaries are implemented.
So, why {2:0, 9:0}
is not ordered on keys? That's because python dictionaries are created with 8 elements, and grow as needed (more on this here).
Let's compute the index to store the data having key = 2
and key = 9
in a dictionary with n = 8
:
print hash(2) % 8 # 2 [hash(2) = 2 and 2 % 8 = 2]
print hash(9) % 8 # 1 [hash(9) = 9 and 9 % 8 = 1]
This means that the array that contains the dictionary data will be:
| index | key | value |
|-------|-----|-------|
| 0 | | |
| 1 | 9 | 0 |
| 2 | 2 | 0 |
| 3 | | |
| 4 | | |
| 5 | | |
| 6 | | |
| 7 | | |
When iterating over it, the order will be the one presented in this representation, so 9
will be before 2
.
You can read more on the topic here.
Upvotes: 5
Reputation: 365707
If you want to know why Python always puts the keys in sorted order… the answer is that it doesn't.
If you want to know why some particular version of some particular implementation of Python puts your particular keys in sorted order, the only real answer to that is the source code.
For CPython (the implementation you're probably using, if you don't know which one you're using), the source is in Objects/dictobject.c
. It changed dramatically in 3.4, and before that in… I think 2.6/3.2, and there have been a few other less dramatic changes in history. So you will have to make sure to look up the version you actually care about. For 3.4, the source is at http://hg.python.org/cpython/file/3.4/Objects/dictobject.c. It's in C, but there are some great comments explaining what it's doing. If you really want to explore it, you could probably even port it to Python and run it under pdb
.
One key issue that may not be obvious from reading the code, unless you understand hash tables, is that there are two "coincidences" here, not just one. First, some versions of CPython, when given a smallish dict constructed all at once, will put the keys in order by their hash values. Second, in all versions of CPython so far, small integers hash to themselves, so—unlike almost any other type—"in order by hash value" also means "in order by value".
Upvotes: 2
Reputation: 174624
everytime i loop through the dictionary (like shown below) it prints the keys in increasing order.
This is just by chance. Dictionaries are unordered collection of objects, that are accessible by keys.
There is no "auto sort", or any other kind of sort.
Just think about it for one second - the whole point of setting your own keys is to be able to fetch by them, so it is not important for the keys to have an "order" - the point is that you know how to refer to each object, because you set its key. This makes it very quick to fetch an object; because its very easy to find. There are no duplicate keys so internally the dictionary can be stored in an optimized way for fast access.
Compare this to a list which is ordered (and its order is guaranteed). In a list, the point is to fetch an object by its reference in the list - that is, by its position relative to other objects in the list. Therefore, it makes sense to maintain order.
Tuples are similar to lists in that the are ordered. One of the differences between tuples and lists is that tuples once set, cannot be changed (you can't "grow" or "shrink" a tuple). In order to modify a tuple, you have to create another tuple. So to "grow" a tuple, add two tuples together to get a third, different tuple. The original two tuples are unchanged.
If you want to know the technical details behind the implementation of dictionaries and how they work "under the hood" this question has a great answer with all the sundry information.
Upvotes: 1
Reputation: 102
The doc says :
It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique
Unlike Python lists or tuples, the key and value pairs in dict objects are not in any particular order. Although the key-value pairs are in a certain order when you instantiate the dictionary, by just calling the dict you can see they aren't stored in the same order. Then if you want to sort them, just use the built-in sorted method
Upvotes: -2