Reputation: 3495
I know the simple way to search would be to have a list containing the strings, and just do if string in list
, but it gets slow, and I've heard dictionary keys practically have no slowdown with large sets due to the fact they're not ordered.
However, I don't need any extra information relating to the items, so it feels a little wrong making a dictionary just to hold the keys and set the values to None
.
Is there something I can use that acts like dictionary keys speed wise, but acts like a list?
Here's a quick example:
import time, random
totalRange = 100000
searchFor = 5000
#Create a list of 10 million characters
searchableList = []
for i in range( totalRange ):
searchableList.append( random.randint( 0, totalRange ) )
#Create dictonary with keys set to 'None'
searchableDict = {}
for i in searchableList:
searchableDict[i] = None
searchableSet = set( searchableList )
#Search list
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
if number in searchableList:
numberMatches += 1
print numberMatches, time.time()-startTime
#Search dictionary keys
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
if number in searchableDict:
numberMatches += 1
print numberMatches, time.time()-startTime
#Search set
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
if number in searchableSet:
numberMatches += 1
print numberMatches, time.time()-startTime
Here are the time outputs:
List: 18.8 seconds
Set: 0.002 seconds
Dictionary: 0.0009 seconds
Even though set is a lot faster than a list, the dictionary is still twice as fast, so I'm wondering if there's anything else I don't know about. Using a dictionary wouldn't be too bad, I'd just imagine there was a cleaner way to do it than dictionary[key]=None
.
Edit based on iCodez's answer:
Tests when totalRange=1000000
and searchFor=50000
(10x higher):
List = 20 minutes and still going
Dictionary = 0.023 seconds
Set = 0.02 seconds
Set.intersection = 0.008 seconds
With more calculations it seems sets and dictionaries have a very similar efficiency, but the set.intersetion
way is clearly a lot better.
Upvotes: 5
Views: 930
Reputation:
You should use a set in this case. Sets have the same lookup time as dictionaries (constant), but they consist of individual items instead of key/value pairs. So, you get the same speed for less memory and a better representation of the data.
Also, you would improve efficiency by using set.intersection
instead of a for-loop:
numberMatches = len(searchableSet.intersection(xrange(searchFor)))
You'll notice too that I replaced range
with xrange
. This keeps Python from building an unnecessary list and thereby wasting memory.
Upvotes: 7