Reputation: 61
I am trying to search a lot of keywords in a textfile and return integers/floats that come after the keyword. I think it's possible using a dictionary where the keys are the keywords that are in the text file and the values are functions that return the following value.
import re
def store_text():
with open("path_to_file.txt", 'r') as f:
text = f.readlines()
return text
abc = store_text()
def search():
for index, line in enumerate(abc):
if "His age is:" in line:
return int(re.search(r"\d+", line).group())
dictionary = {
"His age is:": print(search())
}
The code returns the value I search in the text file but in search() I want to get rid of typing the keyword again, because its already in the dictionary. Later on I want to store the values found in an excel file.
Upvotes: 0
Views: 4414
Reputation: 1658
You can put up your keywords that you need to search in a list. This way you end up specifying your input keywords just once in your program. Also, I've modified your program to make it a bit efficient. Explanation given in comments.
import re
import csv
list_of_keywords = ["His age is:","His number is:","Your_Keyword3"] # You can add more keywords to find and match to this list
def store_text():
with open("/Users/karthick/Downloads/sample.txt", 'r') as f:
text = f.readlines()
return text
abc = store_text()
def search(input_file):
# Initialize an empty dictionary to store the extracted values
dictionary = dict()
#Iterate through lines of textfile
for line in input_file:
#FOr every line in text file, iterate through the keywords to check if any keyword is present in the line
for keyword in list_of_keywords:
if keyword in line:
#If any matching keyword is present, append the dictionary with new values
dictionary.update({keyword : re.search(r"\d+", line).group()})
return dictionary
#Call the above function with input
output_dict = search(abc)
For storing the output values in an Excel csv:
#Write the extracted dictionary to an Excel csv file
with open('mycsvfile.csv','w') as f: #Specify the path of your output csv file here
w = csv.writer(f)
w.writerows(output_dict.items())
Upvotes: 1
Reputation: 8868
If you have the keywords ready to be in a list, the following approach can help.
import re
from multiprocessing import Pool
search_kwrds = ["His age is:", "His name is:"] # add more keywords if you need.
search_regex = "|".join(search_kwrds)
def read_search_text():
with open("path_to_file.txt", 'r') as f:
text = f.readlines()
return text
def search(search_line):
search_res = re.search(search_regex, search_line)
if search_res:
kwrd_found = search_res.group(0)
if kwrd_found:
suffix_val = int(re.search(r"\d+", search_line).group())
return {kwrd_found: suffix_val }
return {}
if __name__ == '__main__':
search_lines = read_search_text()
p = Pool(processes=1) # increase, if you want a faster search
s_res = p.map(search,search_lines)
search_results ={kwrd: suffix for d in s_res for kwrd, suffix in d.items()}
print(search_results)
You can add more keywords to the list and search for them. This focuses on searches where you will have a single keyword on a given line and keywords are not repeating in further lines.
Upvotes: 2