Reputation: 838
I have list of strings say,
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
I need to extract the x1s present in few sentences.
My sentence is "eskimo lives as a wild man in wild jungle and he stands as a guard".
In the sentence, I need to extract first word eskimo and the seventh and eighth words wild man and they are separate words as in x1. I should not extract "stands" even though sta is present in stands.
def get_name(input_str):
prod_name= []
for row in x1:
if (row.strip().lower()in input_str.lower().strip()) or (len([x for x in input_str.split() if "\b"+x in row])>0):
prod_name.append(row)
return list(set(prod_name))
The function
get_name("eskimo lives as a wild man in wild jungle and he stands as a guard")
returns
[esk, eskimo,wild man,sta]
But the expected is
[eskimo,wild man]
May I know what has to be changed in the code?
Upvotes: 2
Views: 755
Reputation: 163362
You could use a regex with whitespace boundaries on the left (?<!\S)
and right (?!\S)
to not get partial matches, and join all the items from the x1
list.
Then use re.findall to get all the matches:
import re
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
pattern = fr"(?<!\S)(?:{'|'.join(re.escape(x) for x in x1)})(?!\S)"
print(re.findall(pattern, s))
Output
['eskimo', 'wild man']
See a Python demo.
Upvotes: 0
Reputation: 1577
You can use regular expressions
import re
x1 = ['esk','wild man','eskimo', 'sta']
my_str = "eskimo lives as a wild man in wild jungle and he stands as a guard"
my_list = []
for words in x1:
if re.search(r'\b' + words + r'\b', my_str):
my_list.append(words)
print(my_list)
According to the new list, because the string (+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa
generate an error with regular expressions you can use a try
except
block
for words in x1:
try:
if re.search(r'\b' + words + r'\b', my_str):
my_list.append(words)
except:
pass
Upvotes: 1
Reputation: 149
I have a slightly different approach. Firstly you could split the input sentence into words and also split each of the phrases you want to check for into constituent words. Then check if each of all words of a phrase are present in the sentence.
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
input_sentence = "eskimo lives as a wild man in wild jungle and he stands as a guard"
# Remove all punctuation marks from the sentence
input_sentence = input_sentence.replace('!', '').replace('.', '').replace('?', '').replace(',', '')
# Split the input sentence into its component words to check individually
input_words = input_sentence.split()
for ele in x1:
# Split each element in x1 into words
ele_words = ele.split()
# Check if all words are part of the input words
if all(ele in input_words for ele in ele_words) and ele in input_sentence:
print(ele)
Upvotes: 2
Reputation: 92
You could simply use str.split(" ") to get a list of all the words in the sentence, and then do the following:
s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
l = s.split(" ")
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
new_x1 = [word.split(" ") for word in x1 if " " in word] + [word for word in x1 if " " not in word]
ans = []
for x in new_x1:
if type(x) == str:
if x in l:
ans.append(x)
else:
temp = ""
for i in x:
temp += i + " "
temp = temp[:-1]
if all(sub_x in l for sub_x in x) and temp in s:
ans.append(temp)
print(ans)
Upvotes: 2