Sven
Sven

Reputation: 21

Filter a list based on values compared against a string

First off, this is my first question on here so if I do something wrong please let me know.

Ok, now to the problem. I am trying to search a list of words and remove any word that doesn't contain a set letter or letters in no particular order. I have tried using the in function as shown below. The list_to_clean shown here is the first few words of the list as an example, the list is really 15,000 values long.

list_to_clean = ['aalii', 'aargh', 'aaron', 'abaci', 'abada', 'abaft', 'abama', 'aband', 'abash', 'abate', 'abave', 'abbas', 'abbes', 'abbot', 'abdat', 'abeam', 'abear', 'abele', 'aberr', 'abhor', 'abidi', 'abyes', 'abime', 'abyss', 'abkar', 'abler', 'ablet', 'abm', 'aboma', 'abord', 'abort', 'about', 'abray', 'abram', 'abret', 'abrim']
cleanby = "ar"

def list_cleaner(cleanby:str, list_to_clean:list):
dict = list_to_clean
for letter in cleanby:
    for word in dict:
        if letter in word:
            nothing = 1
        else:
            dict.remove(word)
return(dict)

I have also tried using re.

def list_cleaner(cleanby:str, list_to_clean:list):
dict = list_to_clean
for letter in cleanby:
    for word in dict:
        if search(letter, word):
            nothing = 1
        else:
            dict.remove(word)
return(dict)

I don't know what the problem is but it works great for about the first 1000 or so words. Then, after a little while, it stops working and lets words thought that don't contain my "key" letters. I am sure there is a really simple reason for this but I am new to python and programming so really basic things stump me.

Thanks in advance for all of yalls help.

Upvotes: 0

Views: 114

Answers (3)

Sven
Sven

Reputation: 21

It turns out the problem was that I was modifying a list that I was iterating through. Someone kindly brought this to my attention in the comments. Don't know how I missed it tbh.

In case you are wondering this is the full def now that it is working.

def list_cleaner(cleanby:str, list_to_clean:list):
"""Returns a filtered list of strings that contains all of the letters in the cleanby string.

Args:
    cleanby (str): Letters you want contained in the returned strings
    list_to_clean (list): List that you want cleaned.
"""
dict = list_to_clean.copy()
for letter in cleanby:
    for word in list_to_clean:
        if letter in word:
            nothing = 1
        else:
            try:
                dict.remove(word)
            except:
                nothing = 1
return(dict) 

ps. I know it is generally not smart to use dict as a variable name but makes sense in the larger scheme of things. Also, the missing indent in the def has something to do with the copy and paste, in the program it is properly indented.

Upvotes: 0

Mark
Mark

Reputation: 92461

Consider using sets for this. If you want to know if the letters "ar" exist in the word "alright" you can use:

set("ar").issubset("alright")
#True

set("ar").issubset("any")
#False

Together you can make a simple list comprehension:

list_to_clean = ['aalii', 'aargh', 'aaron', 'abaci', 'abada']
cleanby = "ar"

[word for word in list_to_clean if not set(cleanby).issubset(word)]
# ['aalii', 'abaci', 'abada']

You can make it a tiny bit more efficient by making the set outside:

letter_set = set(cleanby)

[word for word in list_to_clean if not letter_set.issubset(word)]

So you should be able to simply return the list if you still want the function:

list_to_clean = ['aalii', 'aargh', 'aaron', 'abaci', 'abada', 'cright', 'riant', 'ra']
cleanby = "ar"

def list_cleaner(cleanby:str, list_to_clean:list):
    letter_set = set(cleanby)
    return [word for word in list_to_clean if not letter_set.issubset(word)]

list_cleaner(cleanby, list_to_clean)
# ['aalii', 'abaci', 'abada', 'cright']

Upvotes: 1

Chris
Chris

Reputation: 36680

You'd be better off thinking about this as generating a new list containing the words you want to keep, rather than removing words from the initial list.

That can be done with a straightforward list comprehension.

[w for w in list_to_clean if not all(ltr in w for ltr in set(cleanby))]

Upvotes: 0

Related Questions