ymbirtt
ymbirtt

Reputation: 1666

Why is testing for inclusion faster with a dict than a set?

I'm aware that under-the-hood, a Python set and a Python dict are very similar. Reading their respecective sources - dict and set - it's clear that they're doing their lookups near identically. In reading this answer, I decided I'd test the author's claim that "set lookups are faster than dict lookups" by hacking together the following mockup:

from timeit import timeit
import random

universe = range(1,100000)
keys = random.sample(universe, 50000)
lookups = random.sample(universe, 50000)
dict_set = dict((k,True) for k in keys)
set_set = set(keys)

def dict_lookup():
    for l in lookups:
        l in dict_set

def set_lookup():
    for l in lookups:
        l in set_set

if __name__ == '__main__':
    set_victories = 0
    dict_victories = 0
    for i in range(100):
        dict_time = timeit('dict_lookup()', setup="from __main__ import dict_lookup", number=10000)
        set_time = timeit('set_lookup()', setup="from __main__ import set_lookup", number=10000)
        print("dict time: {}".format(dict_time))
        print("set time:  {}".format(set_time))
        if set_time < dict_time:
            set_victories += 1
        else:
            dict_victories += 1
    print("Sets were faster in  {} trials".format(set_victories))
    print("Dicts were faster in {} trials".format(dict_victories))

The expected result was that set lookups and dict lookups would have indistinguishable performance, given their implementations. What I actually found was the following final result:

$ python3 --version
Python 3.4.5
$ python3 sets-vs-dicts.py
<snip - see below for full output>
Sets were faster in  2 trials
Dicts were faster in 98 trials

So the dict is actually consistently faster than the set. Naturally I'm not suggesting that we should all ditch sets and use the faster dict since sets make the intent of the programmer much clearer and the difference is pathetically small given the size of the tests. I do, however, find this result incredibly strange. What's going on here?

If you're curious, the full output is below:

$ python3 set-vs-dict.py
dict time: 57.754860900342464
set time:  56.8056653002277
dict time: 50.8890880998224
set time:  50.642351899296045
dict time: 49.936297399923205
set time:  50.66272980067879
dict time: 49.92973940074444
set time:  50.65518939960748
dict time: 49.949383799917996
set time:  50.66877659969032
dict time: 49.93578719999641
set time:  50.64872649963945
dict time: 49.96432110015303
set time:  50.676835800521076
dict time: 49.95099350064993
set time:  50.64867010060698
dict time: 49.98275039996952
set time:  50.648987299762666
dict time: 49.92164439987391
set time:  50.66931669972837
dict time: 49.98953749984503
set time:  50.652459900826216
dict time: 49.95234560035169
set time:  50.65124330017716
dict time: 49.98174169939011
set time:  50.6712632002309
dict time: 49.93824000004679
set time:  50.65437529981136
dict time: 49.95089349988848
set time:  50.65370349958539
dict time: 49.963413699530065
set time:  50.65550949983299
dict time: 49.955208600498736
set time:  50.66121090017259
dict time: 49.94347499962896
set time:  50.64449250046164
dict time: 49.95420549996197
set time:  50.66687630023807
dict time: 49.92143050022423
set time:  50.64667259994894
dict time: 50.05037229973823
set time:  50.67966340016574
dict time: 49.93846719991416
set time:  50.64651320036501
dict time: 49.921281000599265
set time:  50.67906459979713
dict time: 49.942994699813426
set time:  50.65166569966823
dict time: 49.94313340075314
set time:  50.656177499331534
dict time: 49.94610709976405
set time:  50.65122799947858
dict time: 49.93874369934201
set time:  50.661101600155234
dict time: 49.94996269978583
set time:  50.63938449975103
dict time: 49.9602530002594
set time:  50.65474760066718
dict time: 49.91891669947654
set time:  50.663624899461865
dict time: 49.959330099634826
set time:  50.653377699665725
dict time: 49.98555530048907
set time:  50.64655719976872
dict time: 49.945239200256765
set time:  50.65128379967064
dict time: 49.95342260040343
set time:  50.65899199992418
dict time: 49.92802210059017
set time:  50.67100259941071
dict time: 49.942902400158346
set time:  50.74889140017331
dict time: 49.994800799526274
set time:  50.731577299535275
dict time: 49.98310230020434
set time:  50.747778999619186
dict time: 49.99376400001347
set time:  50.73122859932482
dict time: 50.00640409998596
set time:  50.68737949989736
dict time: 49.94556000083685
set time:  50.722481600008905
dict time: 49.98192979954183
set time:  50.72525530029088
dict time: 49.99698970001191
set time:  50.736096899956465
dict time: 49.94320739991963
set time:  50.71096289996058
dict time: 49.972679699771106
set time:  50.71838010009378
dict time: 49.957800599746406
set time:  50.747396499849856
dict time: 49.97235369961709
set time:  50.69941039942205
dict time: 49.951399500481784
set time:  50.647985899820924
dict time: 49.94027389958501
set time:  50.66828709933907
dict time: 49.94174600020051
set time:  50.65279300045222
dict time: 49.96716000046581
set time:  50.64943030010909
dict time: 49.95117200072855
set time:  50.65525580011308
dict time: 49.962328700348735
set time:  50.66319840028882
dict time: 49.960031100548804
set time:  50.672181099653244
dict time: 49.93908840045333
set time:  50.651302699930966
dict time: 49.94130470044911
set time:  50.655242399312556
dict time: 50.04310019966215
set time:  50.67391949985176
dict time: 49.93010629992932
set time:  50.64970660023391
dict time: 49.991717299446464
set time:  50.65591560024768
dict time: 49.952454400248826
set time:  50.649492600001395
dict time: 49.92677689995617
set time:  50.635977199301124
dict time: 49.95432769972831
set time:  50.64075019955635
dict time: 49.94808299932629
set time:  50.664196100085974
dict time: 49.966013699769974
set time:  50.649582100100815
dict time: 49.9813024001196
set time:  50.64982909988612
dict time: 49.93897459935397
set time:  50.66509110014886
dict time: 49.95878900028765
set time:  50.649003400467336
dict time: 49.96674569975585
set time:  50.69693780038506
dict time: 49.91303739976138
set time:  50.675189800560474
dict time: 49.950330699793994
set time:  50.64532170072198
dict time: 49.95022019930184
set time:  50.65448010060936
dict time: 49.95197269972414
set time:  50.65391890052706
dict time: 49.94361769966781
set time:  50.67086180020124
dict time: 49.95455109979957
set time:  50.670443600043654
dict time: 49.94633509963751
set time:  50.65955980028957
dict time: 49.967472000047565
set time:  50.66301089990884
dict time: 49.95830660033971
set time:  50.67482869978994
dict time: 49.984512499533594
set time:  50.67321899998933
dict time: 50.01141999941319
set time:  50.84260869957507
dict time: 50.31206789985299
set time:  51.02959220018238
dict time: 50.28449110034853
set time:  51.03110689949244
dict time: 50.303432799875736
set time:  51.02032170072198
dict time: 50.281682999804616
set time:  51.05188430007547
dict time: 50.30898350011557
set time:  51.01742030028254
dict time: 50.3027657000348
set time:  51.02114639990032
dict time: 50.00038649979979
set time:  50.65360379964113
dict time: 49.93306410033256
set time:  50.63413709960878
dict time: 49.95266539976001
set time:  50.65499630011618
dict time: 49.94854210037738
set time:  50.703547400422394
dict time: 49.96691229939461
set time:  50.69470370002091
dict time: 49.95223430078477
set time:  50.70982529968023
dict time: 49.954243999905884
set time:  50.791720499284565
dict time: 49.97948960028589
set time:  50.69436000008136
dict time: 49.98102519940585
set time:  50.73820179980248
dict time: 49.96782180014998
set time:  50.722959300503135
dict time: 49.9863857999444
set time:  50.70789400022477
dict time: 49.9592831004411
set time:  50.707397900521755
dict time: 49.94034240022302
set time:  50.667025099508464
dict time: 49.96215169969946
set time:  50.72984409984201
dict time: 49.98776920046657
set time:  50.72097889985889
Sets were faster in  2 trials
Dicts were faster in 98 trials

Upvotes: 1

Views: 742

Answers (1)

Simon
Simon

Reputation: 424

When I tested your code I thought the numbers were maybe a little to small. So I increased them by a factor 10 and let random.sample have a 1 number in 100 numbers ratio.

import random
from time import time


def timeit(func):
    def wrap(*args):
        start = time()
        result = func(*args)
        return time()-start
    return wrap


def get_set_and_dict():
    universe = range(1, 10**8)
    keys = random.sample(universe, 10**6)
    lookups = random.sample(universe,10**6)
    dict_set = dict((k,True) for k in keys)
    set_set = set(keys)
    return dict_set, set_set, lookups


@timeit
def test(container, lookups):

    for i in lookups:
        a = i in container


def main():
    dict_set, set_set, lookups = get_set_and_dict()
    acc_set = acc_dict = 0
    rounds = 100
    for _ in range(rounds):
        acc_dict += test(dict_set, lookups)
        acc_set += test(set_set, lookups)
    print("Set time: {:.4f}s\n Dict time: {:.4f}s".format(acc_set/rounds, acc_dict/rounds))

if __name__ == '__main__':
    main()

>> Set time: 0.1263s
>> Dict time: 0.1578s

But it would make sense if set and dict differed since they are not, even if alike, the same thing.


Maybe just depending on how you setup your experiment, the conclusions will differ.

Upvotes: 1

Related Questions