Reputation: 9450
I have a two lists as shown below:
c = ['John', 'query 989877 forcast', 'Tamm']
isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju']
I want to check every item in isl
with every item in c
so that I get all my partial string matches.
The output that I need will look like the below:
out = ["john", "query 989877", "tamm"]
As can be seen I have gotten the partial string matches as well.
I have tried the below:
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
But this only gives me the output as
out = ["John", "Tamm"]
I have also tried the below:
print [word for word in c if word.lower() in (e.lower() for e in isl)]
But this outputs only "John". How do I get what I want?
Upvotes: 4
Views: 5390
Reputation: 9450
Alright I have come up with this! An extremely hacky way to do it; I don't like the method myself but it gives me my output:
Step1:
in: c1 = []
for r in c:
c1.append(r.split())
out: c1 = [['John'], ['query', '989877', 'forcast'], ['Tamm']]
Step2:
in: p = []
for w in isl:
for word in c1:
for w1 in word:
if w1.lower() in w.lower():
p.append(w1)
out: p = ['query', '989877', 'John', 'Tamm']
Step3:
in: out = []
for word in c:
t = []
for i in p:
if i in word:
t.append(i)
out.append(t)
out: out = [['John'], ['query', '989877'], ['Tamm']]
Step4:
in: out_final = []
for i in out:
out_final.append(" ".join(e for e in i))
out: out_final = ['John', 'query 989877', 'Tamm']
Upvotes: 0
Reputation: 250951
Perhaps something like this:
def get_sub_strings(s):
words = s.split()
for i in xrange(1, len(words)+1): #reverse the order here
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
...
>>> out = []
>>> for word in c:
for sub in get_sub_strings(word.lower()):
for s in isl:
if sub in s.lower():
out.append(sub)
...
>>> out
['john', 'query', '989877', 'query 989877', 'tamm']
If you want to store only the biggest match only then you need to generate the sub-strings in reverse order and break as soon a match is found in isl
:
def get_sub_strings(s):
words = s.split()
for i in xrange(len(words)+1, 0, -1):
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
out = []
for word in c:
for sub in get_sub_strings(word.lower()):
if any(sub in s.lower() for s in isl):
out.append(sub)
break
print out
#['john', 'query 989877', 'tamm']
Upvotes: 4