Reputation: 2692
I'm trying to find keywords within a sentence, where the keywords are usually single words, but can be multi-word combos (like "cost in euros"). So if I have a sentence like cost in euros of bacon
it would find cost in euros
in that sentence and return true.
For this, I was using this code:
if any(phrase in line for phrase in keyword['aliases']:
where line
is the input and aliases
is an array of phrases that match a keyword (like for cost in euros, it's ['cost in euros', 'euros', 'euro cost']
).
However, I noticed that it was also triggering on word parts. For example, I had a match phrase of y
and a sentence of trippy cake
. I'd not expect this to return true, but it does, since it apparently finds the y
in trippy
. How do I get this to only check whole words? Originally I was doing this keyword search with a list of words (essentially doing line.split()
and checking those), but that doesn't work for multi-word keyword aliases.
Upvotes: 2
Views: 509
Reputation: 1714
This should accomplish what you're looking for:
import re
aliases = [
'cost.',
'.cost',
'.cost.',
'cost in euros of bacon',
'rocking euros today',
'there is a cost inherent to bacon',
'europe has cost in place',
'there is a cost.',
'I was accosted.',
'dealing with euro costing is painful']
phrases = ['cost in euros', 'euros', 'euro cost', 'cost']
matched = list(set([
alias
for alias in aliases
for phrase in phrases
if re.search(r'\b{}\b'.format(phrase), alias)
]))
print(matched)
Output:
['there is a cost inherent to bacon', '.cost.', 'rocking euros today', 'there is a cost.', 'cost in euros of bacon', 'europe has cost in place', 'cost.', '.cost']
Basically, we're grabbing all matches, using pythons re
module as our test, including cases where multiple phrase
s occur in a given alias
, using a compound list comprehension
, then using set()
to trim duplicates from the list
, then using list()
to coerce the set
back into a list
.
Refs:
Lists: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
List comprehensions: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
Sets: https://docs.python.org/3/tutorial/datastructures.html#sets
re (or regex): https://docs.python.org/3/library/re.html#module-re
Upvotes: 2