Cassano
Cassano

Reputation: 323

Find if list contains something from string

I have a list that contains specific words - I want to find out if something from a string is found in the list.

Here's my code:

words = ["Lorem", "facilisis", "consectetur", "iaculis", "dolor"]

message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " \
          "Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec. " \
          "Vestibulum luctus erat sit amet suscipit commodo"
BA = None
QA = None
EX = None

for i in message:
    if i == words:
        BA = i

print(f"Word found: {BA}")

Output: Word found: None

What's the mistake here?

Upvotes: 0

Views: 91

Answers (3)

hc_dev
hc_dev

Reputation: 9377

Issues

Your for-loop iterates over each character in the message. So it has 176 iterations, each checking if character is list (! probably not what you want): if i == words

Solution

Other than answered by Riccardo with the elegant and concise but advanced construct of list-comprehension you could also fix your loop:

(A) You can either just turn your search around if word in message.

(B) Alternatively first split the message to chunks (words), e.g. by whitespace as delimiter. Then iterate over each of those chunks and test if in your list.

words = ["Lorem", "facilisis", "consectetur", "iaculis", "dolor"]

message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " \
          "Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec. " \
          "Vestibulum luctus erat sit amet suscipit commodo"

# (A) in approach 
for w in words:
    if w in message:
        print(f"(A) Word found: {w}")

# (B) split approach 
for chunk in message.split():
    if chunk in words:
        print(f"(B) Word found: {chunk}")

Prints a different ordered but same set of 5 words for each approach:

(A) Word found: Lorem
(A) Word found: facilisis
(A) Word found: consectetur
(A) Word found: iaculis
(A) Word found: dolor
(B) Word found: Lorem
(B) Word found: dolor
(B) Word found: consectetur
(B) Word found: facilisis
(B) Word found: iaculis

Note: the default separator when invoking str.split() without arguments is a whitespace (space, tab, new-line, etc.).

Bonus: improved splitting

To not only split on a single separator character or default whitespace use the string constants like:

  • string.whitespace, or regex equivalent shorthand \s
  • string.punctuation

in combination with re.split (split by regular-expression) you can even improve your split and find words next to a line-break like 'Vestibulum' or words next to a punctuation-mark like ['amet', 'elit', 'orci', 'nec']:

message = "\tLorem ipsum dolor sit amet, consectetur adipiscing elit. " \
          "Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec.\n" \
          "Vestibulum luctus erat sit amet suscipit commodo"
words = ['amet', 'elit', 'orci', 'nec', 'Vestibulum']

import string
import re

sep_regex = '['+string.punctuation+'\s]'  # use \s instead string.whitespace
chunks = re.split(sep_regex, message)
found_words = [w for w in chunks if w in words]
print(found_words)

Prints:

['amet', 'elit', 'orci', 'nec', 'Vestibulum', 'amet']

Note: It contains 'amet' twice because it was found twice. To get only the unique words found convert it to a set using set(found_words)

See also:

Upvotes: 2

Memphis Meng
Memphis Meng

Reputation: 1671

In your codes, for i in message: is going to traverse the characters not words. So minimum edit to achieve what you need should be:

for i in message.split(' '):

Upvotes: 0

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Try this:

words_found = [word for word in words if word in message]

Upvotes: 0

Related Questions