Ninja Warrior 11
Ninja Warrior 11

Reputation: 372

Match at least two words from an input against a statement

I am struggling to write a regex that matches at least two words in case 1 that will match A to B. I just found a way to exclude does or any dictionary word in the input A so there's no problem in case 2. The Wakanda and exist in case 1 - A should match B, assuming, words like do, in, and the is already removed.

CASE 1
A -> Do Wakanda exist in the world?
B -> Does Wakanda exist?
>> A should match B

exclude = ['do', 'in', 'the']
A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"
split_A = A.lower().split()
final_A = [i if i not in exclude else '' for i in split_A]
A = " ".join(' '.join(final_A).strip().split())

CASE 1
A -> wakanda exist world?
B -> Does Wakanda exist?
>> A should match B

CASE 2
A -> Does Atlantis exist in our world?
B -> Does Wakanda exist?
>> A should not match B

Upvotes: 1

Views: 76

Answers (2)

Code Jockey
Code Jockey

Reputation: 6721

EDIT:

this is a more "pure" regex solution, IF it runs in whatever regex parser you're using:

concatenate your strings with "||" and attempt to match with this regex:

(?i).*?(\b\w+\b).*?(\b\w+\b).*?\|\|(?:.*\b\1\b.*\b\2\b.*|.*\b\2\b.*\b\1\b.*)

so, run on the string wakanda exist world||Does Wakanda exist? it would match with two groups: wakanda and exist

if you run it on wakanda xist ello world||does exist wakanda hello it would not match two, because only wakanda matches...

other, more verbose and scalable solution:

Turn "wakanda exist world?" into "\bwakanda\b|\bexist\b|\bworld\b" however you like, and run it on the second string, getting a match, like wakanda, then remove wakanda from your list and run it again. If you get a second match, then you're good.

Since you haven't specified Python as a language tag and I don't know python, I'm going to provide JavaScript to do this, and you can adapt it if you need to

var simplifiedSentence1 = "wakanda exist world?";
var simplifiedSentence2 = "Does Wakanda exist?"

matchExp = new RegExp(".*?("
    + simplifiedSentence1
        .replace(/\W+/g,"|")
        .replace(/^\||\|$/,"")
        .replace(/(\w+)/g,"\\b$1\\b")
    + ")","i");
match = matchExp.exec(simplifiedSentence2)[1];
matchExp2 = new RegExp("\\b" + match + "\\b\\W*", "i");
TwoWordsMatched = matchExp.test(simplifiedSentence2.replace(matchExp2, ""));

TwoWordsMatched will be true if two words match between the two statements, and false if one or fewer match

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195543

You can use set operations to see if two sentences match (no need to use regex, but you need to do some preprocessing - remove ?, put sentence in lowercase etc.):

A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"

A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"

exclude = ['do', 'in', 'the', 'does']

def a_match_b(a, b):
    a = set(a.replace('?', '').lower().split()) - set(exclude)
    b = set(b.replace('?', '').lower().split()) - set(exclude)
    return len(a.intersection(b)) > 1

print(a_match_b(A, B))
print(a_match_b(A2, B2))

Output is:

True
False

Edit:

As @tobias_k said, you can use regexp to find the words, so you can alternatively use:

import re

A = "Do Wakanda exist in the world?"
B = "Does Wakanda exist?"

A2 = "Does Atlantis exist in our world?"
B2 = "Does Wakanda exist?"

exclude = ['do', 'in', 'the', 'does']

def a_match_b(a, b):
    words_a = re.findall(r'[\w]+', a.lower())
    words_b = re.findall(r'[\w]+', b.lower())
    a = set(words_a) - set(exclude)
    b = set(words_b) - set(exclude)
    return len(a.intersection(b)) > 1

print(a_match_b(A, B))
print(a_match_b(A2, B2))

Upvotes: 1

Related Questions