Minwu Yu
Minwu Yu

Reputation: 311

Python3 compare two list and find wildcard match

list A: ['abc.txt', '123.txt', 'apple.jpg']

list B: ['ab', '123']

I want to generate a new list A that only contains the ones not in list B with wildcard match. The idea output will be:

list C: ['apple.jpg']

Here is my code:

lista=['abc.txt', 'happy.txt', 'apple.jpg']
listb=['happy', 'ab']
listc=lista

for a in lista:
    for b in listb:
        print(a + ": " + b)
        if b in a:
            listc.remove(a)

print(listc)

The output of my code is:

abc.txt: happy
abc.txt: ab
apple.jpg: happy
apple.jpg: ab
['happy.txt', 'apple.jpg']

Anyone know where it went wrong? And, any better way to do it? Tks.

Upvotes: 0

Views: 253

Answers (4)

Victor
Victor

Reputation: 612

The problem is here :

listc = lista

You're copying the reference, not the content : so listc is lista. When you remove element from lista, listc is going to lose this element, too.

If you want to copy the content of the lista in listc, you need to use :

import copy
listc = copy.copy(lista)

You can get more informations here : How to clone or copy a list?

Upvotes: 0

Saeed Bolhasani
Saeed Bolhasani

Reputation: 580

python as default copy list by reference. you need to make a deep copy from lista to listc. copy library can help you. modify your code like this:

    import copy
    lista=['abc.txt', 'happy.txt', 'apple.jpg']
    listb=['happy', 'ab']
    listc=copy.deepcopy(lista)

    for a in lista:
        for b in listb:
            if b in a:
                listc.remove(a)

     print(listc) 

Upvotes: 0

Djaouad
Djaouad

Reputation: 22776

You could use this list comprehension which filters the elements that don't exist in B (they aren't in any one of B's elements, and all B's elements aren't in them):

lista = ['abc.txt', '123.txt', 'apple.jpg']
listb = ['ab', '123']

listc = [a for a in lista if all(a not in b and b not in a for b in listb)]
print(listc) # => ['apple.jpg']

Upvotes: 0

DYZ
DYZ

Reputation: 57033

After the assignment listc=lista both variables refer to the same list. As a result, you modify the list through which you iterate, which causes the undesirable side effects. You should make a copy of the original list: listc=lista.copy().

Here's a better, regex-based solution to your problem:

import re
pattern = re.compile('|'.join(listb)) # Anything ON the listb
# re.compile('happy|ab')
listc = [a for a in lista if not pattern.match(a)]
# ['apple.jpg']

Upvotes: 1

Related Questions