user10083444
user10083444

Reputation: 105

Extracting a particular text from a log file using regex

I have the following log file

2020-06-30 12:44:06,608 DEBUG [main] [apitests.ApiTest] Reading of Excel File Started
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Asus Laptop
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Time Taken : 1959 milliseconds
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Asus Laptop": ["Premium grade"]}}
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Intext Hardrive
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Time Taken : 243 milliseconds
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Intext Hardrive": ["Medium grade"]}}
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------

My goal is to just extract the words ["premium grade"], ["Medium grade"]....and so on. Basically the value of the key value.

I wrote the below code.

import re
with open('quality.log', 'r') as text_file:
    text_file=text_file.read()  
    for line in text_file :  
        matches=re.findall(r"\[(.*?)\]", line)[0]
with open('qualitygrade.txt', 'w') as out:
    out.write('\n'.join(matches))

The goal of the re.findall(r"\[(.*?)\]", line)[0] is to just extract the "premium grade","medium grade" etc.

Not sure what I am doing wrong. My outputtext is blank. Any help pls.

Upvotes: 0

Views: 1287

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You don't need the for loop as you are reading the whole file at once.

Your code could be:

with open('quality.log', 'r') as text_file:
    text_file=text_file.read()
    matches = re.findall(r'\["(.*?)"]', text_file)

If you want to get the values between the double quotes, you should add them to the pattern.

\["(.*?)"]

Output

Premium grade
Medium grade

Upvotes: 1

user13843527
user13843527

Reputation: 11

This for will overwrite matches for every line

for line in text_file :  
    matches=re.findall(r"\[(.*?)\]", line)[0]

You need to either (a) write to the output file as you find the matches or (b) store the matches in a separate variable. (b) would be something along the lines of this

import re

matches = []

with open('quality.log', 'r') as text_file:
    text_file=text_file.read()  
    for line in text_file :  
        matches += re.findall(r"\[.*?\]", line)

with open('qualitygrade.txt', 'w') as out:
    out.write('\n'.join(matches))

Also you need to fix your regex since the one you are currently using will also catch some other tokens in your log.

Upvotes: 1

Related Questions