Reputation: 1
I'm currently trying to search for keywords + the following word in a string/text. I want to store these keywords in an already created text file (f
). I have wrote this code so far:
def keyword_extraction(text, keyword_list, f):
temp = re.findall(r"[\w']+", text)
for keyword in keyword_list:
if keyword in temp:
results = [temp[temp.index(keyword) + 1]]
for word in results:
f.writelines(keyword + ': ' + word + '\n')
else:
f.writelines('Keyword "' + keyword + '" not found\n')
The problem is, whenever the keyword is found, the algorithm stops. But I want to extract all of the keywords, so when they appear twice in a text, they should be written down twice. Do you have any suggestions of how I can fix that?
Example input:
text = "today is a sunny day dont you think? I like this day very much"
keyword_list = ['like', 'day']
Expected output:
like: this
day: dont
day: very
actual output:
like: this
day: dont
Thank you for your help!
Upvotes: 0
Views: 384
Reputation: 123481
You can do it by looping and removing the first occurrence of each keyword found from the list being searched until no more are left.
import re
def keyword_extraction(text, keyword_list, f):
temp = re.findall(r"[\w']+", text)
for keyword in keyword_list:
found = False
while keyword in temp:
found = True
try:
next_word = temp[temp.index(keyword) + 1]
except IndexError:
next_word = ''
f.writelines(keyword + ': ' + next_word + '\n')
temp.remove(keyword)
if not found:
f.writelines('Keyword "' + keyword + '" not found\n')
text = "today is a sunny day dont you think? I like this day very much"
keyword_list = ['like', 'day', 'much']
with open('keyword_search_results.txt', 'w') as f:
keyword_extraction(text, keyword_list, f)
print('fini')
Upvotes: 0
Reputation: 811
text = "today is a sunny day dont you think? I like this day very much"
keyword_list = ['like', 'day']
splitted_text = text.split()
for index, word in enumerate(splitted_text):
if word in keyword_list:
print(f'{word}: {splitted_text[index+1]}')
Output:
day: dont
like: this
day: very
Upvotes: 3