Reputation: 935
Let's take these sample lists :
L_main = [1,2,3,4,5,6,7,8,9]
L1 = [1,7,3,12]
L2 = [0,2,51,5,9]
L3 = [3,2,8]
I would like to create a list which contains the values of L_main that are neither in L1, nor in L2, nor in L3. The following code does the job, but it is really slow with big lists :
[i for i in L_main if i not in L1 and i not in L2 and i not in L3]
Do you know please a more efficient way ?
Expected result :
[4,6]
Upvotes: 0
Views: 229
Reputation: 714
TL;DR
Time trials on a modest laptop favour
list(set(L_main).difference(L1,L2,L3))
which for large lists offers dramatic improvements over a naive list comprehension or its for
loop near-equivalent.
Detail
Surprisingly enough using a dict seems to hold its own (but is not good enough to justify switching from a good set
implementation):
from itertools import chain
excludes = dict((n,None) for n in chain(L1, L2, L3))
[n for n in L_main if n not in excludes]
A defaultdict can also be used but the modification is probably pointless (included in the analysis below for posterity).
Some horse races on a modest laptop using 3.9 (CPython):
Original sample data (unsurprisingly doesn't tell us much):
%%timeit #used for all of these, not shown later
# OP starting solution
i for i in L_main if i not in L1 and i not in L2 and i not in L3] # is slow for big lists
3.82 µs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Set v1
list(set(L_main).difference(L1+L2+L3))
3.75 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Set v2
list(set(L_main).difference(L1,L2,L3))
3.57 µs ± 142 ns
# Set v3
list(set(L_main)-set(L1+L2+L3))
3.57 µs ± 48.6 ns
# Set v4
l = set.union(set(L1),set(L2), set(L3))
list(set(L_main)-l)
5.06 µs ± 176 ns
# For loop
final_list = []
for num in L_main:
if num not in L1 and num not in L2 and num not in L3:
final_list += [num]
final_list
3.9 µs ± 98.1 ns
# Dict v1
excludes = dict((n,None) for n in chain(L1, L2, L3))
[n for n in L_main if n not in excludes]
9.09 µs ± 115 ns # worse here, but wait!
# Dict v2 (defaultdict)
excludes = defaultdict(None, ((n, None) for n in chain(L1, L2, L3)))
[n for n in L_main if n not in excludes]
12.9 µs ± 72.5 ns
Now on a moderately large set, i.e.
from random import randint
L_main = range(10000)
L1 = [randint(900, 20000) for _ in range(1000)]
L2 = [randint(900, 20000) for _ in range(2000)]
L3 = [randint(900, 20000) for _ in range(3000)]
len(L_main), len(L1), len(L2), len(L3)
(10000, 1000, 2000, 3000)
Results from slowest to fastest:
2200ms OP Starting solution
2160ms For loop
7.7ms Set v3 (+/- .7ms)
5.8ms Set v4 (+/- .6ms)
5.1ms Dict v1 (+/- .2ms)
5.0ms Dict v2 (+/- .2ms) # Not significantly different from v1
4.4ms Set v1 (+/- .6ms)
3.7ms Set v2 (+/- .3ms)
Finally, with an additional ~9x upsize in the input lists, i.e.
from random import randint
L_main = range(50000)
L1 = [randint(900, 80000) for _ in range(9000)]
L2 = [randint(900, 80000) for _ in range(15000)]
L3 = [randint(900, 80000) for _ in range(30000)]
len(L_main), len(L1), len(L2), len(L3)
(50000, 9000, 15000, 30000)
We have:
SKIP OP Starting solution
SKIP For loop
58ms Set v4 (+/- 1ms)
53ms Dict v2 (+/- 3ms) # Not significantly diff from v1
51ms Dict v1 (+/- 3ms)
50ms Set v3 (+/- 1ms)
42ms Set v1 (+/- 2ms)
38ms Set v2 (+/- 1ms) # Enough of an edge to be statistically significant
Upvotes: 0
Reputation: 421
Change the list to sets and change your code.
l = set.union(set(l1),set(l2), set(l3))
list(set(L_main)-l)
Upvotes: 0
Reputation: 41
Using a simple for can be quicker in this case.
final_list = []
for num in L_main:
if num not in L1 and num not in L2 and num not in L3:
final_list.append(num)
I know that semantically it is the same of the example in your question, but you can use time.time()
to measure the time for big lists.
You can also change the .append()
and use the +=
operator, it is also quicker:
final_list = []
for num in L_main:
if num not in L1 and num not in L2 and num not in L3:
final_list += [num]
Cheers.
EDIT: I also like a lot the set differences proposed, my answer is just in case you do not want to change the structure.
Upvotes: 0
Reputation: 16147
.difference
is probably better, but worth noting that you can subtract sets as well.
set(L_main)-set(L1+L2+L3)
Upvotes: 0
Reputation: 117886
You can compute the set.difference
of all the latter lists from the first one.
>>> set(L_main).difference(L1,L2,L3)
{4, 6}
Upvotes: 3
Reputation: 51345
You can use the set
differences between L_main
and the rest of your lists concatentated together:
>>> list(set(L_main).difference(L1+L2+L3))
[4, 6]
Upvotes: 5