Reputation: 935

Find values in a list that are not in others lists

Let's take these sample lists :

L_main = [1,2,3,4,5,6,7,8,9]
L1 = [1,7,3,12]
L2 = [0,2,51,5,9]
L3 = [3,2,8]

I would like to create a list which contains the values of L_main that are neither in L1, nor in L2, nor in L3. The following code does the job, but it is really slow with big lists :

[i for i in L_main if i not in L1 and i not in L2 and i not in L3]

Do you know please a more efficient way ?
Expected result :

[4,6]

Upvotes: 0

Answers (6)

Jamie Deith

Reputation: 714

TL;DR

Time trials on a modest laptop favour

list(set(L_main).difference(L1,L2,L3))

which for large lists offers dramatic improvements over a naive list comprehension or its for loop near-equivalent.

Detail

Surprisingly enough using a dict seems to hold its own (but is not good enough to justify switching from a good set implementation):

from itertools import chain
excludes = dict((n,None) for n in chain(L1, L2, L3))
[n for n in L_main if n not in excludes]

A defaultdict can also be used but the modification is probably pointless (included in the analysis below for posterity).

Some horse races on a modest laptop using 3.9 (CPython):

Original sample data (unsurprisingly doesn't tell us much):

%%timeit #used for all of these, not shown later
# OP starting solution
i for i in L_main if i not in L1 and i not in L2 and i not in L3] # is slow for big lists
3.82 µs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Set v1
list(set(L_main).difference(L1+L2+L3))
3.75 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Set v2
list(set(L_main).difference(L1,L2,L3))
3.57 µs ± 142 ns

# Set v3
list(set(L_main)-set(L1+L2+L3))
3.57 µs ± 48.6 ns

# Set v4
l = set.union(set(L1),set(L2), set(L3))
list(set(L_main)-l)
5.06 µs ± 176 ns

# For loop
final_list = []
for num in L_main:
    if num not in L1 and num not in L2 and num not in L3:
         final_list += [num]
final_list
3.9 µs ± 98.1 ns

# Dict v1
excludes = dict((n,None) for n in chain(L1, L2, L3))
[n for n in L_main if n not in excludes]
9.09 µs ± 115 ns # worse here, but wait!

# Dict v2 (defaultdict)
excludes = defaultdict(None, ((n, None) for n in chain(L1, L2, L3)))
[n for n in L_main if n not in excludes]
12.9 µs ± 72.5 ns

Now on a moderately large set, i.e.

from random import randint
L_main = range(10000)
L1 = [randint(900, 20000) for _ in range(1000)]
L2 = [randint(900, 20000) for _ in range(2000)]
L3 = [randint(900, 20000) for _ in range(3000)]
len(L_main), len(L1), len(L2), len(L3)
(10000, 1000, 2000, 3000)

Results from slowest to fastest:

2200ms OP Starting solution
2160ms For loop
7.7ms Set v3 (+/- .7ms) 
5.8ms Set v4 (+/- .6ms)
5.1ms Dict v1 (+/- .2ms)
5.0ms Dict v2 (+/- .2ms) # Not significantly different from v1
4.4ms Set v1 (+/- .6ms)
3.7ms Set v2 (+/- .3ms)

Finally, with an additional ~9x upsize in the input lists, i.e.

from random import randint
L_main = range(50000)
L1 = [randint(900, 80000) for _ in range(9000)]
L2 = [randint(900, 80000) for _ in range(15000)]
L3 = [randint(900, 80000) for _ in range(30000)]
len(L_main), len(L1), len(L2), len(L3)
(50000, 9000, 15000, 30000)

We have:

SKIP  OP Starting solution
SKIP  For loop
58ms Set v4 (+/- 1ms)
53ms Dict v2 (+/- 3ms) # Not significantly diff from v1
51ms Dict v1 (+/- 3ms)
50ms Set v3 (+/- 1ms) 
42ms Set v1 (+/- 2ms)
38ms Set v2 (+/- 1ms) # Enough of an edge to be statistically significant

Upvotes: 0

Mohammad sadegh borouny

Reputation: 421

Change the list to sets and change your code.

l = set.union(set(l1),set(l2), set(l3))
list(set(L_main)-l)

Upvotes: 0

Victor Moraes

Reputation: 41

Using a simple for can be quicker in this case.

final_list = []
for num in L_main:
    if num not in L1 and num not in L2 and num not in L3:
         final_list.append(num)

I know that semantically it is the same of the example in your question, but you can use time.time() to measure the time for big lists.

You can also change the .append() and use the += operator, it is also quicker:

final_list = []
for num in L_main:
    if num not in L1 and num not in L2 and num not in L3:
         final_list += [num]

Cheers.

EDIT: I also like a lot the set differences proposed, my answer is just in case you do not want to change the structure.

Upvotes: 0

Chris

Reputation: 16147

.difference is probably better, but worth noting that you can subtract sets as well.

set(L_main)-set(L1+L2+L3)

Upvotes: 0

Cory Kramer

Reputation: 117886

You can compute the set.difference of all the latter lists from the first one.

>>> set(L_main).difference(L1,L2,L3)
{4, 6}

Upvotes: 3

sacuL

Reputation: 51345

You can use the set differences between L_main and the rest of your lists concatentated together:

>>> list(set(L_main).difference(L1+L2+L3))
[4, 6]

Upvotes: 5

Find values in a list that are not in others lists

Answers (6)

Related Questions