Programming Noob
Programming Noob

Reputation: 1845

Is there a faster way to check if a value is in a set in python?

I want to maintain a list of strings and keep checking for different variables if they are in the list.

What I'm doing is :

li=[one,two,three]
for x in different_x_values:
  if(x in li):
    print 'yes'

Is there a faster way than using x in li that checks if a some string is in some other collection of strings?

Upvotes: 2

Views: 1190

Answers (4)

specialscope
specialscope

Reputation: 4218

I am sure you already got your answer, this is for the sake of completeness. Out of curiosity I ran the tests for the methods mentioned I got following results.

Size=size of main list
setopttime: the one that uses s1&s2
setintersectiontime: the one using intersection operation
bruteforcetime:burte force time

Size=1000
setoptime:0:00:00.001000
setintersectiontime:0:00:00.001000
bruteforcetime:0:00:00.005000

Size=10000
setoptime:0:00:00.001000
setintersectiontime:0:00:00.010000
bruteforcetime:0:00:00.367000

Size=100000
setoptime:0:00:00.001000
setintersectiontime:0:00:00.115000
bruteforcetime:0:00:35.444000

Method 1 (setopttime) wins hands down. Here is the code if you want to check.

import timeit
import datetime
def getList(size):
    s=[]
    for i in range(size):
        s.append(str(i))
    return s

def getSubList(size):
    s=[]
    for i in range(size/2):
        s.append(str(i))
    return s

def testSet(size):
    s=set(getList(size))
    sublist=set(getSubList(size))
    list(s&sublist)


def testIntersection(size):
    s=set(getList(size))
    sublist=set(getSubList(size))
    list(s.intersection(sublist))

def bruteForce(size):
    s=getList(size)
    sublist=getSubList(size)
    final=[]
    for elem in sublist:
        try:
            if s.index(elem)!=-1:
                final.append(elem)
        except ValueError:
            pass
currentsize=1000
for i in range(3):
    t1=datetime.datetime.now()
    testSet(1000)
    t2=datetime.datetime.now()
    testIntersection(currentsize)
    t3=datetime.datetime.now()
    bruteForce(currentsize)
    t4=datetime.datetime.now()
    print "Size="+str(currentsize)
    print "setoptime:"+str(t2-t1)
    print "setintersectiontime:"+str(t3-t2)
    print "bruteforcetime:"+str(t4-t3)
    currentsize=currentsize*10

Upvotes: 1

Joran Beasley
Joran Beasley

Reputation: 113940

print set(different_values).intersection(li)

should work the same ... might be faster..

>>> other_values = ["franks","pies","pie","beans"]
>>> li = ["apple","pie","franks","and","beans"]
>>> set(other_values).intersection(li)
set(['franks', 'beans', 'pie'])

here are some benchmarks

In [1]: other_values = range(1000)
In [2]: li = range(500,1000)
In [3]: %timeit set(other_values).intersection(li)
10000 loops, best of 3: 78.6 us per loop
In [4]: %timeit [x for x in other_values if x in li]
100 loops, best of 3: 8.7 ms per loop
In [5]: %timeit set(other_values) & set(li)
10000 loops, best of 3: 97.6 us per loop

so looks like set intersection is hugely faster

Upvotes: 4

Keith
Keith

Reputation: 43024

Use a set object for this.

Python2> s = set(["one", "two", "three"])
Python2> "one" in s
True

Python2> s2 = set(["three", "four", "five"])

Intersection:

Python2> s & s2
set(['three'])

Python2> x = "three"
Python2> x in s & s2
True

I think that is what you want, the intersection of two sets. ALso the intent is more readable, I think, if you know about set objects.

Upvotes: 4

KevinL
KevinL

Reputation: 170

Check the builtin "set()" datatype in more recent python versions - it should be faster than lists, I think. Dictionary keys used to be the fast way, but I think sets are equal or faster now, so if you're working with a pre-2.5 or so version of python you might do the following (doesn't matter what the values in the dict are):

li = {'one':1, 'two': 1, 'three':1}
for x in different_values:
    if x in li:
        print 'yes'

More recent versions, just make li a set().

Upvotes: 4

Related Questions