Mathieu
Mathieu

Reputation: 761

Check if a string contains at least two of the strings in a list

I want to know if there are at least 2 words from a list in a string. Some words are duplicate in the list. I would like to find two different words in the string from the list.

I tried this:

keywords = ["word1", "word1", "word1", "word2", "word3"]
r = "word4 word2 word1 word5"

for keyword in keywords:
    if keyword in r:
        for keyword2 in keywords:
            if keyword2 in r:
                if keyword2 != keyword:
                    status="ok"
                    print("here at least 2 words matching")
                    break

Upvotes: 2

Views: 1295

Answers (3)

Jolbas
Jolbas

Reputation: 752

If you put the words from csv in a set. Then you can use ’set.intersection()` to find all the words common to the string.

keyword_set = set(keywords)
common = keyword_set.intersection(r.split())
if len(common) >= 2:
    print('Found:', common)

The approach suggested by Tomerikoo that breaks as soon as two items are found is faster and can be altered to stop if two of the same word are found too. But it only prints the first two matches even if there are more. Here's a shortened version:

# To find only unique words, use `words_found = set()`
words_found = []
for word in set(keywords):
    if word in r:
        # If words_found is a set, use `words_found.add(word)`
        words_found.append(word)
        if len(words_found) >= 2:
            print("Found:", words_found)
            break

Upvotes: 4

Tomerikoo
Tomerikoo

Reputation: 19414

First convert to a set to remove the duplicates. Then create an iterator on that set and check that you can match the wanted amount of words:

keywords = iter(set(keywords))
num_of_words_to_find = 2
words_found = []

for _ in range(num_of_words_to_find-1):
    for word in keywords:
        if word in r:
            words_found.append(word)
            break

for word in keywords:
    if word in r:
        words_found.append(word)
        print(f"Found {num_of_words_to_find} words:", ', '.join(words_found))
        break
else:
    print(f"No {num_of_words_to_find} different words in string")

Upvotes: 2

Ani
Ani

Reputation: 539

Load the csv word list in a string list, remove duplicates, and then for every word

counter = 0
for keyword in wordslist:
    if keyword in r:
        print(keyword + " in string")
        counter = counter + 1

if counter >= 2:
    print("at least two matching words are in the string")

Upvotes: 0

Related Questions