why duplicates removal in list using a set method gives output with different index each time?

Question

I know to remove duplicates in a list...just curious to know why set does not give order as orginal list

my_list = ['apple', 'mango', 'grape', 'apple', 'guava', 'pumpkin']
>>>[*set(my_list)]

#output:
>>> ['mango', 'apple', 'grape', 'guava', 'pumpkin']
>>> ['pumpkin', 'guava', 'grape', 'mango', 'apple']

gimix · Accepted Answer

As all the comments say, a set is unordered, always.

But internally it uses a hash table, and IIRC the values stored are the hash of the object modulo the table size. Now small integers tend to have themselves as their hash values, so you may have the impression that they are sorted (not ordered by insertion order), but this won't always be the case:

ls = [1,2,3]
[*set(ls)]
[1, 2, 3]

ls = [2,1,3]
[*set(ls)]
[1, 2, 3]

ls2=[-1,-2,3]
[*set(ls2)]
[3, -1, -2]

ls2=[-2,-1,3]
[*set(ls2)]
[3, -2, -1]

Other objects, like the strings in your example, have very different hash values, so the behaviour is totally different:

hash('mango')
-7062263298897675226

why duplicates removal in list using a set method gives output with different index each time?

Answers (1)

Related Questions