Reputation: 1460
Task is to match a keyword from a paragraph, what I did was I broke the paragraph into words and put them in a list and then used the search words from another list and did a match.
data :
Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data.
ROUTE OPTIMIZATION – Spring Clean
Aim: Minimizing the overall travel time using optimization techniques.
CUSTOMER SEGMENTATION:
Aim: Develop an engine which segments and provides the score for
customers based on their behavior and analyze their purchasing pattern.
Attempted code:
s = ['tagged', 'product title', 'tagging', 'analyze']
skills = []
for word in data.split():
print(word)
word.lower()
if word in s:
skills.append(word)
skills1 = list(set(skills))
print(skills1)
['tagged', 'tagging', 'analyze']
As I used the split function, every word is split and hence I am not able to detect the word product title
which is there in the paragraph.
Appreciate if anyone can help on this.
Upvotes: 0
Views: 78
Reputation: 139
"Aim:" must be in each line of "data" so I'll find the index for this word("Aim:")
p = "Automatic Product Title Tagging Aim: To automate the process of product title tagging using manually tagged data."
index = p.find("Aim:") # 33
print(p[33:])
output:
"Aim: To automate the process of product title tagging using manually tagged data."
w_lenght = len("Aim:") # 4 : for exclude word "Aim:"
print(p[37:])
output:
" To automate the process of product title tagging using manually tagged data."
example:
s = ['tagged', 'product title', 'tagging', 'analyze']
skills = []
for line in data.split("\n"):
index = line.find("Aim:") + len("Aim:") #4
if index != -1:
for word in line[index:].split():
if word.lower() in s:
skills.append(word)
print(word)
Upvotes: 0
Reputation: 5354
What you are searching for is not a 'keyword' but a phrase. One solution is to use a regular expression search (a simple substring is in text
construct won't work well because when given 'product title', it might catch byproduct titles
, which isn't what you want).
This should do it:
import re
[ k for k in skills if re.search( r'\b' + k + r'\b', data, flags=re.IGNORECASE ) ]
Upvotes: 3
Reputation: 82755
Iterate the list s
and check if element in string.
Demo:
data = """
Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data.
ROUTE OPTIMIZATION – Spring Clean
Aim: Minimizing the overall travel time using optimization techniques.
CUSTOMER SEGMENTATION:
Aim: Develop an engine which segments and provides the score for
customers based on their behavior and analyze their purchasing
pattern.
"""
s = ['tagged', 'product title', 'tagging', 'analyze']
data = data.lower()
skills = []
for i in s:
if i.lower() in data:
skills.append(i)
print(skills)
Or in a single line.
skills = [i for i in s if i.lower() in data]
Output:
['tagged', 'product title', 'tagging', 'analyze']
Upvotes: 2
Reputation: 652
split() splits the string around the passed argument. The default argument for split() is a space. Since you want to search 'product title' which also includes a space, you can do one of these:
1) Search for the phrase directly in the paragraph
2) if you split, then you can search for a match in i and i+1 indices
Upvotes: 0