Reputation: 1309
Given two string s1
and s2
, I want to extract all overlap spans spans
where len(spans)>=K
For example:
s1 = "Today is Friday. Nice weather, isn't it?"
s2 = "It's Black Friday today. "
K = 1
the expected answer is
spans = ["Friday"] # Sensitive to big capital letter
here is my implement:
def norm(s):
punctuation = [",", ".", "?", "!"]
s = s.split()
for i, x in enumerate(s):
if any([x.endswith(p) for p in punctuation]):
s[i] = x[:-1] + " " + x[-1]
s = " ".join(s)
s = s.split()
return s
def func(s1,s2,K=1):
punctuation = [",", ".", "?", "!"]
s1 = norm(s1)
s2 = norm(s2)
spans = []
for i, x in enumerate(s1):
for j in range(K, len(s1)-K):
cur_span = " ".join(s1[i:i+j])
if cur_span in " ".join(s2):
spans.append(cur_span)
spans = [x for x in spans if x not in punctuation]
return spans
s1 = "Today is Friday. Nice weather, isn't it?"
s2 = "It's Black Friday today. "
func(s1,s2,1) # return ['Friday']
Seeking for better implement for this function
Upvotes: 0
Views: 64
Reputation: 13401
You can use set.intersection
along with split
s1_set = set([i.strip('.,?!') for i in s1.split()])
s2_set = set([i.strip('.,?!') for i in s2.split()])
print(s1_set.intersection(s2_set))
# {'Friday'}
Upvotes: 1