Reputation: 21
I'new to programming, when I was using python, I find the 'in' operation's performances on different data structures are quite different. For example:
a=list_a######list_a and list_b both are lists,data scale:300,000
b=set(list_b)
t1=time()
s=0
for entry in a:
if entry in b:
s+=1
t2=time()
print t2-t1
and I ended with result like this, which is very effiecient
0.0699999332428
however, when I search list_b without changing into set data structure
a=list_a
b=list_b
t1=time()
s=0
for entry in a:
if entry in b:
s+=1
t2=time()
print t2-t1
and this time the result took almost ten minutes
539.641000032
I have searched the Internet, and found this is somehow related to hash map, but still confusing. Can anyone please explain this in detail or are there other data structures in python similar to this?
thanks in advance.
Upvotes: 1
Views: 100
Reputation: 600026
Lists have linear time lookup. That's because to find whether an item is in a list, Python needs to scan through every item until it finds a match; so the time it takes is proportional to the length of the list. The longer the list, the longer it will take. In computer science terms, this is called O(n)
time complexity.
Sets and dictionaries have constant time lookups. Instead of just storing the elements in a series, indexed only by position, they store a hash of the value. To find whether there is a matching item, Python hashes the value and goes to the matching index. No matter how big the set, it will always take the same amount of time - this is known as O(1)
complexity.
Upvotes: 3