Ecaloota
Ecaloota

Reputation: 71

Finding partial string matches between list and elements of list of lists

I have a list of strings:

mylist = ['foo hydro', 'bar']

and a list of lists of strings called test:

testI = ['foo', 'bar']             ## should succeed
testJ = ['foo']                    ## should fail
testK = ['foo hydro']              ## should fail
testL = ['foo hydro', 'bar']       ## should succeed
testM = ['foo', 'bar', 'third']    ## should fail

test = [testI,testJ,testK,testL,testM]

I need to be able to check if there's a (partial or whole) string match between each element of each list in test and each element of mylist.

So, testI should succeed because testI[0] is a partial string match of mylist[0] and because testI[1] is a complete string match for mylist[1].

However, testJ and testK should each fail because they only match one of the two strings in mylist, and testM should fail because it contains an element which doesn't match with any element in mylist

So far, I've tried to play around with any:

for i in mylist:
    for j in test:
        for k in j:
            if any(i in b for b in k):
                print("An element of mylist matches an element of test")

So I can catch if any element of mylist matches any element in each list in test, but I can't work out a way to meet all the requirements.

Any suggestions? I'm happy to refactor the question if it makes dealing with it easier.

Upvotes: 0

Views: 414

Answers (2)

Yanirmr
Yanirmr

Reputation: 1032

I want to suggest a solution to your problem.

Firstly, we create function that recognizes if a word is a substring of any word in another list:

def is_substring_of_element_in_list(word, list_of_str):
    if len(list_of_str) == 0:
        return (False, -1)
    is_sub = any([word in s for s in list_of_str])
    if (is_sub == True):
        ix = [word in s for s in list_of_str].index(True)
    else: 
        ix = -1
    return is_sub, ix 

Now, we can use this function to check if each word from the test list is a substring of a word on your list. Notice, we can use every word only once so we need to remove a string if a given word is a substring of.

def is_list_is_in_mylist(t, mylist):
    mylist_now = sorted(mylist, key=len)
    test_now = sorted(t, key=len)
    counter = 0
    for word in t:
        is_sub, index = is_substring_of_element_in_list(word, mylist_now)
        if is_sub:
            mylist_now.pop(index)
            test_now.remove(word)
            counter += 1
    if counter == len(t) and counter == len(mylist):
        print("success")
    else:
        print("fail")

Pay attention, we need to sort the elements in the list to avoiding mistakes caused by the order of the words. For example, if my_list = ['f', 'foo'] and test1 = ['f', 'foo'] and test2 = ['foo', 'f'] without sorting, one of the success and the other will be faild.

Now, you can iterate over your test with simple for loop:

for t in test:
    is_list_is_in_mylist(t, mylist)

Upvotes: 1

Bhargav Desai
Bhargav Desai

Reputation: 1016

i think this code probably match your conditions :

for t in test:
    counter = 0
    if len(t) == len(mylist):
        t = list(dict.fromkeys(t))
        temp = []
        for s in t:
            if not any([s in r for r in t if s != r]):
                temp.append(s)
        for l in temp:
            for m in mylist:
                if l in m:
                    counter = counter + 1
        if counter == len(mylist):
            print('successed')
        else:
            print('fail')
    else:
        print('fail')

Upvotes: 0

Related Questions