user10520282
user10520282

Reputation:

How to find a match of words in two strings?

I have two strings like

My name is Bogdan and Bogdan and I am from Russia

I need to get word Bogdan from this strings. I always know what end of first sentence == start of second sentence.

How can I find this overlapping.

My solution returns similar chars

res = list(set('My name is Bogdan').intersection(set('Bogdan and i am from Russia')))
print(res)

Returns

['i', 'n', 'g', 'm', ' ', 's', 'B', 'a', 'd', 'o']

Upvotes: 0

Views: 1737

Answers (3)

iGian
iGian

Reputation: 11183

Other option, with for loop:

def shared_words(s1, s2):
  res = []
  l_s1, l_s2 = set(s1.split()), set(s2.split())
  for ss1 in l_s1:
    if ss1 in l_s2: res.append(ss1)
  return res

Apply to the string:

s1 = "My name is Bogdan"
s2 = "Bogdan and I am from Russia"
print(shared_words(s1, s2)) #=> ['Bogdan']

Or, using regex to split only words:

import re

def shared_words(s1, s2):
  res = []
  l_s1, l_s2 = set(re.findall(r'\w+',s1)), set(re.findall(r'\w+',s2))
  for ss1 in l_s1:
    if ss1 in l_s2: res.append(ss1)
  return res

To get:

s1 = "My name is Bogdan, I am here"
s2 = "Bogdan and I am from Russia."
print(shared_words(s1, s2)) #=> ['Bogdan', 'I', 'am']

Upvotes: 1

Graipher
Graipher

Reputation: 7186

You start by overlapping the two strings maximally and then iterate by reducing the overlap:

def find_overlap(s1, s2):
    for i in range(len(s1)):
        test1, test2 = s1[i:], s2[:len(s1) - i]
        if test1 == test2:
            return test1

s1, s2 = "My name is Bogdan", "Bogdan and I am from Russia"
find_overlap(s1, s2)
# 'Bogdan'
s1, s2 = "mynameisbogdan", "bogdanand"
find_overlap(s1, s2)
# 'bogdan'

As you can see this also works if the two strings do not contain spaces.

This has O(n) runtime, but could be reduced to O(min(n, m)) if you first determine which of the two strings is shorter.

If you expect the string to find to be much shorter than even the shortest of the two strings, you can make this even O(k), where k is the length of the string to find by starting with a minimal overlap:

def find_overlap(s1, s2):
    for i in range(1, len(s1) + 1):
        if i == len(s2):
            return None
        test1, test2 = s1[-i:], s2[:i]
        if test1 == test2:
            return test1

Upvotes: 3

mad_
mad_

Reputation: 8273

Can use set intersection

l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
print(set(l1.split())&set(l2.split())) # set('Bogdan')

List comprehension

l1="My name is Bogdan"
l2="Bogdan and I am from Russia"
[i for i in l1.split() if i in l2.split()] ['Bogdan']

Upvotes: 1

Related Questions