Land Owner
Land Owner

Reputation: 182

Filter list of strings to not contain any of the string from another list as a substring

I have following code to select the values which are not contained in the another list.

import re
isbn  = ["1111","2222","3333","4444","5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]

for x in isbn:
    for i in sku:
        if x not in i:
            print (i)

Expected outcome should be like this:

k6 6666
k7 7777
k8 8888

But I get all unmatched values. How can I get the expected outcome as I showed above.

Upvotes: 2

Views: 880

Answers (3)

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48090

You should be using any within your loop. Infact you may achieve it using below list comprehension as:

>>> list_1  = ["1111","2222","3333","4444","5555"]
>>> list_2 = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]

>>> [x for x in list_2 if not any( y in x for y in list_1)]
['k6 6666', 'k7 7777', 'k8 8888']

Here any will return True if any of string in list_1 is present as substring in list2. As soon as it finds the match, it will short-circuit the iteration (without checking for other matches) and will return the result as True.

In case if you are not interested in using any, you may get the same result with the below for loop as:

for x in list_2:
    for y in list_1:
        if y in x:
            break
    else:
        print(x)

which will print your desired output:

k6 6666
k7 7777
k8 8888

Upvotes: 8

Martijn Pieters
Martijn Pieters

Reputation: 1123360

You would need to test all values in isbn before you can conclude none of those match.

Rather than loop over isbn first, loop over sku and test that value with each of the isbn values; the any() function makes that easier and more efficient:

for value in sku:
    if not any(i in value for i in isbn):
        print(value)

More efficient still would be to split out the ISBN portion, and test against a set:

isbn_set = set(isbn)
for value in sku:
    isbn_part = value.partition(' ')[-1]  # everything after the first space
    if isbn_part not in isbn_set:
        print(value)

This avoids looping over isbn altogther; set membership testing takes O(1) constant time; for N skus and M ISBN values, this makes a O(N) loop (vs O(NM) loop with any()).

Either version can be converted to a list comprehension to produce a list of matches; the preferred set version then becomes:

isbn_set = set(isbn)
not_matched = [value for value in sku if value.partition(' ')[-1] not in isbn_set]

Demo of the latter:

>>> isbn  = ["1111","2222","3333","4444","5555"]
>>> sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666", "k7 7777", "k8 8888" ,"k9 1111"]
>>> isbn_set = set(isbn)
>>> [value for value in sku if value.partition(' ')[-1] not in isbn_set]
['k6 6666', 'k7 7777', 'k8 8888']

Upvotes: 2

Stephen Rauch
Stephen Rauch

Reputation: 49832

If you remove matches from a set, then the left over set is what you are after:

Code:

skus = set(sku)
for x in isbn:
    skus -= {i for i in skus if x in i}

Test Code:

isbn = ["1111", "2222", "3333", "4444", "5555"]
sku = ["k1 1111", "k2 2222", "k3 3333", "k4 4444", "k5 5555", "k6 6666",
       "k7 7777", "k8 8888", "k9 1111"]

skus = set(sku)
for x in isbn:
    skus -= {i for i in skus if x in i}
print(skus)

Results:

{'k6 6666', 'k7 7777', 'k8 8888'}

Upvotes: 0

Related Questions