Peter
Peter

Reputation: 3495

What's the most efficient way to search a list millions of times?

I know the simple way to search would be to have a list containing the strings, and just do if string in list, but it gets slow, and I've heard dictionary keys practically have no slowdown with large sets due to the fact they're not ordered.

However, I don't need any extra information relating to the items, so it feels a little wrong making a dictionary just to hold the keys and set the values to None.

Is there something I can use that acts like dictionary keys speed wise, but acts like a list?

Here's a quick example:

import time, random

totalRange = 100000
searchFor = 5000

#Create a list of 10 million characters
searchableList = []
for i in range( totalRange ):
    searchableList.append( random.randint( 0, totalRange ) )

#Create dictonary with keys set to 'None'
searchableDict = {}
for i in searchableList:
    searchableDict[i] = None

searchableSet = set( searchableList )

#Search list
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
    if number in searchableList:
        numberMatches += 1
print numberMatches, time.time()-startTime

#Search dictionary keys
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
    if number in searchableDict:
        numberMatches += 1
print numberMatches, time.time()-startTime

#Search set
startTime = time.time()
numberMatches = 0
for number in range( searchFor ):
    if number in searchableSet:
        numberMatches += 1
print numberMatches, time.time()-startTime

Here are the time outputs:

List: 18.8 seconds
Set: 0.002 seconds
Dictionary: 0.0009 seconds

Even though set is a lot faster than a list, the dictionary is still twice as fast, so I'm wondering if there's anything else I don't know about. Using a dictionary wouldn't be too bad, I'd just imagine there was a cleaner way to do it than dictionary[key]=None.



Edit based on iCodez's answer:

Tests when totalRange=1000000 and searchFor=50000 (10x higher):

List = 20 minutes and still going
Dictionary = 0.023 seconds
Set = 0.02 seconds
Set.intersection = 0.008 seconds

With more calculations it seems sets and dictionaries have a very similar efficiency, but the set.intersetion way is clearly a lot better.

Upvotes: 5

Views: 930

Answers (2)

user2555451
user2555451

Reputation:

You should use a set in this case. Sets have the same lookup time as dictionaries (constant), but they consist of individual items instead of key/value pairs. So, you get the same speed for less memory and a better representation of the data.


Also, you would improve efficiency by using set.intersection instead of a for-loop:

numberMatches = len(searchableSet.intersection(xrange(searchFor)))

You'll notice too that I replaced range with xrange. This keeps Python from building an unnecessary list and thereby wasting memory.

Upvotes: 7

Joran Beasley
Joran Beasley

Reputation: 113950

use

a_dict = dict.fromkeys(my_text.split())

Upvotes: 4

Related Questions