Wessowang
Wessowang

Reputation: 61

Searching keywords in a text file with a dictionary with python

I am trying to search a lot of keywords in a textfile and return integers/floats that come after the keyword. I think it's possible using a dictionary where the keys are the keywords that are in the text file and the values are functions that return the following value.

import re

def store_text():
    with open("path_to_file.txt", 'r') as f:
        text = f.readlines()
        return text

abc = store_text()

def search():
        for index, line in enumerate(abc):
            if "His age is:" in line:
                return int(re.search(r"\d+", line).group())


dictionary = {
    "His age is:": print(search())
}

The code returns the value I search in the text file but in search() I want to get rid of typing the keyword again, because its already in the dictionary. Later on I want to store the values found in an excel file.

Upvotes: 0

Views: 4414

Answers (2)

Karthick Mohanraj
Karthick Mohanraj

Reputation: 1658

You can put up your keywords that you need to search in a list. This way you end up specifying your input keywords just once in your program. Also, I've modified your program to make it a bit efficient. Explanation given in comments.

import re
import csv

list_of_keywords = ["His age is:","His number is:","Your_Keyword3"]   # You can add more keywords to find and match to this list

def store_text():
    with open("/Users/karthick/Downloads/sample.txt", 'r') as f:
        text = f.readlines()
        return text

abc = store_text()

def search(input_file):   
    # Initialize an empty dictionary to store the extracted values
    dictionary = dict()
    #Iterate through lines of textfile
    for line in input_file:
        #FOr every line in text file, iterate through the keywords to check if any keyword is present in the line
        for keyword in list_of_keywords:
            if keyword in line:
                #If any matching keyword is present, append the dictionary with new values
                dictionary.update({keyword : re.search(r"\d+", line).group()})
    return dictionary

#Call the above function with input
output_dict = search(abc)

For storing the output values in an Excel csv:

#Write the extracted dictionary to an Excel csv file
with open('mycsvfile.csv','w') as f:  #Specify the path of your output csv file here
    w = csv.writer(f)
    w.writerows(output_dict.items())

Upvotes: 1

Kris
Kris

Reputation: 8868

If you have the keywords ready to be in a list, the following approach can help.

import re
from multiprocessing import Pool

search_kwrds = ["His age is:", "His name is:"] # add more keywords if you need.
search_regex = "|".join(search_kwrds)

def read_search_text():
    with open("path_to_file.txt", 'r') as f:
        text = f.readlines()
        return text


def search(search_line):
    search_res = re.search(search_regex, search_line)
    if search_res:
        kwrd_found = search_res.group(0)
        if kwrd_found:
            suffix_val = int(re.search(r"\d+", search_line).group())
            return {kwrd_found: suffix_val }
    return {}


if __name__ == '__main__':

    search_lines = read_search_text()
    p = Pool(processes=1) # increase, if you want a faster search
    s_res = p.map(search,search_lines)
    search_results ={kwrd: suffix for d in s_res for kwrd, suffix in d.items()}
    print(search_results)

You can add more keywords to the list and search for them. This focuses on searches where you will have a single keyword on a given line and keywords are not repeating in further lines.

Upvotes: 2

Related Questions