Reputation: 1257

Check if elements of list with words and phrases exist in another list

I have a list of words and phrases:

words = ['hi', 'going on', 'go']

And I have a transcript:

transcript ="hi how are you. I am good. what's going on.".split('.')

I need to find matches in this transcript. For the example above, matches are in the first and third elements of the transcript.

I followed answers from here and I tried to use the following code:

for i in range(len(transcript)):
    if any(word in transcript[i] for word in words):
        print(i)

Its output is:

1
2
3

But it is not what I want. I want to exclude 'i am good' sentences from the output. The expected output is:

1
3

Upvotes: 0

Answers (3)

Alain T.

Reputation: 42133

The issue is that you are not limiting your search to whole expressions. This means that any word that can appear as a substring of another word (e.g. "go" is a substring of "good") will be treated as a match.

This would call for use of regular expressions (the re module)

Alternatively, you could transform every non-letter characters into spaces, and then perform the search with padded spaces around the words and text so that you only find whole word (whole expressions in your case).

For example:

# translation table for all non-letters to spaces
from string import printable,ascii_letters
spaces     = str.maketrans({nl:" " for nl in set(printable)-set(ascii_letters)})

words       = ['hi', 'going on', 'go']
paddedWords = [f" {word} " for word in words]

transcript = "hi how are you. I am good. what's going on.".split('.')
for i,text in enumerate(transcript,1):
    paddedText = f" {text.lower().translate(spaces)} "
    if any( word in paddedText for word in paddedWords):
        print(i)

# 1
# 3

Upvotes: 0

Leo Arad

Reputation: 4472

You can try

for i in range(len(transcript)):
    if any(word in [i for i in transcript[i].split(" ")] if len(word.split(" ")) < 2 else word in transcript[i] for word in words):
        print(i+1)

That will output

1
3

This code will not check if the word is just a part of the transcript[i] like 'go' in 'good'.

Upvotes: 1

Dhaval Taunk

Reputation: 1672

The error is there because go is present as a substring in I am good.

You can try this in if condition:-

if any(word in transcript[i].split() if len(word.split()) < 2 else word in transcript[i] for word in words):
    print(i+1)

This will give you the desired output.

1
3

Upvotes: 0

Check if elements of list with words and phrases exist in another list

Answers (3)

Related Questions