user1452759
user1452759

Reputation: 9450

Python: Check for partial match of strings between two lists

I have a two lists as shown below:

c = ['John', 'query 989877 forcast', 'Tamm']
isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju']

I want to check every item in isl with every item in c so that I get all my partial string matches. The output that I need will look like the below:

out = ["john", "query 989877", "tamm"]

As can be seen I have gotten the partial string matches as well.

I have tried the below:

 out = []
 for word in c:
    for w in isl:
        if word.lower() in w.lower():
                 out.append(word)

But this only gives me the output as

out = ["John", "Tamm"]

I have also tried the below:

print [word for word in c if word.lower() in (e.lower() for e in isl)]

But this outputs only "John". How do I get what I want?

Upvotes: 4

Views: 5390

Answers (2)

user1452759
user1452759

Reputation: 9450

Alright I have come up with this! An extremely hacky way to do it; I don't like the method myself but it gives me my output:

Step1:
in: c1 = []
    for r in c:
       c1.append(r.split()) 
out: c1 = [['John'], ['query', '989877', 'forcast'], ['Tamm']]


Step2:
in: p = []
    for w in isl:
        for word in c1:
            for w1 in word:
                 if w1.lower() in w.lower():
                         p.append(w1)
out: p = ['query', '989877', 'John', 'Tamm']


Step3:
in: out = []
    for word in c:
        t = []
        for i in p:
             if i in word:
                t.append(i)
        out.append(t)
out: out = [['John'], ['query', '989877'], ['Tamm']]

Step4:
in: out_final = []
    for i in out:
        out_final.append(" ".join(e for e in i))
out: out_final = ['John', 'query 989877', 'Tamm']

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250951

Perhaps something like this:

def get_sub_strings(s):
    words = s.split()
    for i in xrange(1, len(words)+1):      #reverse the order here
        for n in xrange(0, len(words)+1-i):
            yield ' '.join(words[n:n+i])
...             
>>> out = []
>>> for word in c:
    for sub in get_sub_strings(word.lower()):
        for s in isl:
            if sub in s.lower():
                out.append(sub)
...                 
>>> out
['john', 'query', '989877', 'query 989877', 'tamm']

If you want to store only the biggest match only then you need to generate the sub-strings in reverse order and break as soon a match is found in isl:

def get_sub_strings(s):
    words = s.split()
    for i in xrange(len(words)+1, 0, -1):
        for n in xrange(0, len(words)+1-i):
            yield ' '.join(words[n:n+i])

out = []
for word in c:
    for sub in get_sub_strings(word.lower()):
        if any(sub in s.lower() for s in isl):
            out.append(sub)
            break

print out
#['john', 'query 989877', 'tamm']

Upvotes: 4

Related Questions