Reputation: 65
I'm trying to search for a list of words, and so I have generated this code:
narrative = "Lasix 40 mg b.i.d., for three days along with potassium chloride slow release 20 mEq b.i.d. for three days, Motrin 400 mg q.8h"
meds_name_final_list = ["lasix", "potassium chloride slow release", ...]
def all_occurences(file, str):
initial = 0
while True:
initial = file.find(str, initial)
if initial == -1:
return
yield initial
initial += len(str)
offset = []
for item in meds_name_final_list:
number = list(all_occurences(narrative.lower(), item))
offset.append(number)
Desired output: list of the starting index/indices in the corpora of the word being a search for, e.g:
offset = [[1], [3, 10], [5, 50].....]
This code works perfectly for not so long words such as antibiotics, emergency ward, insulin etc. However, long words that are broken by new line spacing are not detected by the function above.
Desired word: potassium chloride slow release
Any suggestion to solve this?
Upvotes: 3
Views: 95
Reputation: 402
How about this?
def all_occurences(file, str):
initial = 0
file = file.replace('\n', ' ')
while True:
initial = file.find(str, initial)
if initial == -1: return
yield initial
initial += len(str)
Upvotes: 3