Reputation: 372
I am struggling to write a regex that matches at least two words in case 1 that will match A to B. I just found a way to exclude does
or any dictionary word in the input A so there's no problem in case 2. The Wakanda
and exist
in case 1 - A should match B, assuming, words like do
, in
, and the
is already removed.
CASE 1
A -> Do Wakanda exist in the world?
B -> Does Wakanda exist?
>> A should match B
exclude = ['do', 'in', 'the']
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
split_A = A.lower().split()
final_A = [i if i not in exclude else '' for i in split_A]
A = " ".join(' '.join(final_A).strip().split())
CASE 1
A -> wakanda exist world?
B -> Does Wakanda exist?
>> A should match B
CASE 2
A -> Does Atlantis exist in our world?
B -> Does Wakanda exist?
>> A should not match B
Upvotes: 1
Views: 76
Reputation: 6721
this is a more "pure" regex solution, IF it runs in whatever regex parser you're using:
concatenate your strings with "||" and attempt to match with this regex:
(?i).*?(\b\w+\b).*?(\b\w+\b).*?\|\|(?:.*\b\1\b.*\b\2\b.*|.*\b\2\b.*\b\1\b.*)
so, run on the string wakanda exist world||Does Wakanda exist?
it would match with two groups: wakanda
and exist
if you run it on wakanda xist ello world||does exist wakanda hello
it would not match two, because only wakanda
matches...
Turn "wakanda exist world?"
into "\bwakanda\b|\bexist\b|\bworld\b" however you like, and run it on the second string, getting a match, like wakanda
, then remove wakanda
from your list and run it again. If you get a second match, then you're good.
Since you haven't specified Python as a language tag and I don't know python, I'm going to provide JavaScript to do this, and you can adapt it if you need to
var simplifiedSentence1 = "wakanda exist world?";
var simplifiedSentence2 = "Does Wakanda exist?"
matchExp = new RegExp(".*?("
+ simplifiedSentence1
.replace(/\W+/g,"|")
.replace(/^\||\|$/,"")
.replace(/(\w+)/g,"\\b$1\\b")
+ ")","i");
match = matchExp.exec(simplifiedSentence2)[1];
matchExp2 = new RegExp("\\b" + match + "\\b\\W*", "i");
TwoWordsMatched = matchExp.test(simplifiedSentence2.replace(matchExp2, ""));
TwoWordsMatched
will be true if two words match between the two statements, and false if one or fewer match
Upvotes: 0
Reputation: 195543
You can use set
operations to see if two sentences match (no need to use regex, but you need to do some preprocessing - remove ?
, put sentence in lowercase etc.):
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"
exclude = ['do', 'in', 'the', 'does']
def a_match_b(a, b):
a = set(a.replace('?', '').lower().split()) - set(exclude)
b = set(b.replace('?', '').lower().split()) - set(exclude)
return len(a.intersection(b)) > 1
print(a_match_b(A, B))
print(a_match_b(A2, B2))
Output is:
True
False
Edit:
As @tobias_k said, you can use regexp to find the words, so you can alternatively use:
import re
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"
exclude = ['do', 'in', 'the', 'does']
def a_match_b(a, b):
words_a = re.findall(r'[\w]+', a.lower())
words_b = re.findall(r'[\w]+', b.lower())
a = set(words_a) - set(exclude)
b = set(words_b) - set(exclude)
return len(a.intersection(b)) > 1
print(a_match_b(A, B))
print(a_match_b(A2, B2))
Upvotes: 1