mikeLundquist
mikeLundquist

Reputation: 1009

how to intersect a list and set in python and preserve the order of the list?

How can I intersect a list and a set to preserve the order of the list? Simple example:

k=[1,2,3,4]
d={3,2}
d.intersection(k)
[2,3]#this is the ideal result

edit: speed is the most important factor here

Upvotes: 1

Views: 1138

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124170

You'll have to use a list comprehension, which lets you keep the order by filtering:

[i for i in k if i in d]

Demo:

>>> k = [1, 2, 3, 4]
>>> d = {2, 3}
>>> [i for i in k if i in d]
[2, 3]

For Python 3, this is the fastest option available; you could use list(filter(d.__contains__, l)) but this is slower on this trivially small example:

>>> from timeit import timeit
>>> def listcomp(k, d):
...     return [i for i in k if i in d]
...
>>> def filtered(k, d):
...     return list(filter(d.__contains__, k))
...
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d')
0.49590064199946937
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d')
0.6533352420010488

and timings worsen when the data sets grow in size:

>>> import random
>>> k = sorted([random.randrange(1000) for _ in range(1000)])
>>> d = {random.randrange(1000) for _ in range(100)}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d', number=10000)
0.30027976899873465
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d', number=10000)
0.4524774450001132

In Python 2, filter() is the faster option, given large enough inputs:

>>> from timeit import timeit
>>> import random
>>> def listcomp(k, d):
...     return [i for i in k if i in d]
...
>>> def filtered(k, d):
...     return filter(d.__contains__, k)
...
>>> k = [1, 2, 3, 4]
>>> d = {2, 3}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d')
0.4015800952911377
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d')
0.4407978057861328
>>> k = sorted([random.randrange(1000) for _ in range(1000)])
>>> d = {random.randrange(1000) for _ in range(100)}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d', number=10000)
0.4594550132751465
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d', number=10000)
0.28088998794555664

I'm not aware of any faster options; the available ordered set implementations are all Pure-python solutions, and slower still.

Upvotes: 7

Related Questions