Reputation: 323
I have a list that contains specific words - I want to find out if something from a string is found in the list.
Here's my code:
words = ["Lorem", "facilisis", "consectetur", "iaculis", "dolor"]
message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " \
"Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec. " \
"Vestibulum luctus erat sit amet suscipit commodo"
BA = None
QA = None
EX = None
for i in message:
if i == words:
BA = i
print(f"Word found: {BA}")
Output: Word found: None
What's the mistake here?
Upvotes: 0
Views: 91
Reputation: 9377
Your for-loop iterates over each character in the message
. So it has 176 iterations, each checking if character is list (! probably not what you want): if i == words
Other than answered by Riccardo with the elegant and concise but advanced construct of list-comprehension you could also fix your loop:
(A) You can either just turn your search around if word in message
.
(B) Alternatively first split
the message
to chunks (words), e.g. by whitespace as delimiter. Then iterate over each of those chunks and test if in your list.
words = ["Lorem", "facilisis", "consectetur", "iaculis", "dolor"]
message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " \
"Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec. " \
"Vestibulum luctus erat sit amet suscipit commodo"
# (A) in approach
for w in words:
if w in message:
print(f"(A) Word found: {w}")
# (B) split approach
for chunk in message.split():
if chunk in words:
print(f"(B) Word found: {chunk}")
Prints a different ordered but same set of 5 words for each approach:
(A) Word found: Lorem
(A) Word found: facilisis
(A) Word found: consectetur
(A) Word found: iaculis
(A) Word found: dolor
(B) Word found: Lorem
(B) Word found: dolor
(B) Word found: consectetur
(B) Word found: facilisis
(B) Word found: iaculis
Note: the default separator when invoking str.split()
without arguments is a whitespace (space, tab, new-line, etc.).
To not only split on a single separator character or default whitespace use the string
constants like:
string.whitespace
, or regex equivalent shorthand \s
string.punctuation
in combination with re.split
(split by regular-expression)
you can even improve your split and find words next to a line-break like 'Vestibulum'
or words next to a punctuation-mark like ['amet', 'elit', 'orci', 'nec']
:
message = "\tLorem ipsum dolor sit amet, consectetur adipiscing elit. " \
"Aliquam aliquet facilisis orci, scelerisque iaculis odio dignissim nec.\n" \
"Vestibulum luctus erat sit amet suscipit commodo"
words = ['amet', 'elit', 'orci', 'nec', 'Vestibulum']
import string
import re
sep_regex = '['+string.punctuation+'\s]' # use \s instead string.whitespace
chunks = re.split(sep_regex, message)
found_words = [w for w in chunks if w in words]
print(found_words)
Prints:
['amet', 'elit', 'orci', 'nec', 'Vestibulum', 'amet']
Note: It contains 'amet'
twice because it was found twice. To get only the unique words found convert it to a set
using set(found_words)
See also:
Upvotes: 2
Reputation: 1671
In your codes, for i in message:
is going to traverse the characters not words. So minimum edit to achieve what you need should be:
for i in message.split(' '):
Upvotes: 0
Reputation: 15364
Try this:
words_found = [word for word in words if word in message]
Upvotes: 0