Reputation: 1157
I built a python (2.7) script that parses a txt file with this code:
cnt = 1
logFile = open( logFilePath, 'r' )
for line in logFile:
if errorCodeGetHostName in line:
errorHostNameCnt = errorHostNameCnt + 1
errorGenericCnt = errorGenericCnt + 1
reportFile.write( "--- Error: GET HOST BY NAME @ line " + str( cnt ) + "\n\r" )
reportFile.write( line )
elif errorCodeSocke462 in line:
errorSocket462Cnt = errorSocket462Cnt + 1
errorGenericCnt = errorGenericCnt + 1
reportFile.write("--- Error: SOCKET -462 @ line " + str(cnt) + "\n\r" )
reportFile.write(line)
elif errorCodeMemory in line:
errorMemoryCnt = errorMemoryCnt + 1
errorGenericCnt = errorGenericCnt + 1
reportFile.write("--- Error: MEMORY NOT RELEASED @ line " + str(cnt) + "\n\r" )
reportFile.write(line)
cnt = cnt + 1
I want to add the line number of each error, and for this purpose I added a counter (cnt) but its value is not related to to the real line number.
This is a piece of my log file:
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2017.06.13 17:05:43 =~=~=~=~=~=~=~=~=~=~=~=
UTC Time fetched from server #1: '0.pool.ntp.org'
*** Test (cycle #1) starting...
--- Test01 completed successfully!
--- Test02 completed successfully!
--- Test03 completed successfully!
--- Test04 completed successfully!
--- Test01 completed successfully!
--- Test02 completed successfully!
INF:[CONFIGURATION] Completed
--- Test03 completed successfully!
Firmware Version: 0.0.0
*** Test (cycle #1) starting...
How can I get the real line number?
Thanks for the help.
Upvotes: 2
Views: 2629
Reputation: 7058
apart from the line-ending issue, there are some other issues with this code
as remarked in on of the comments, it is best to open files with a with
-statement
Now you have 1 big loop where you both loop over the original file, parse it and immediately write to the ReportFile
. I think it would be best to separate those.
Make one function to loop over the log, return the details you need, and a next function looping over these details and writing them to a report. this is a lot more robust, and easier to debug and test when something goes wrong
I would also let the IO as much outside as possible. If you later want to stream to a socket or something, this can be easily done
Lines 6 to 24 of your code contain a lot of lines that are almost the same, and if you want to add another error you want to report, you need to add another 5 lines of code, almost the same. I would use a dict
and a for-loop to cut on the boilerplate-code
A smaller remark is that you don't use the handy things Python offers, like yield
the with
-statement, enumerate
or collections.counter
Also variable naming is not according to PEP-8
, but that is mainly aesthetic
errors = {
error_hostname_count: {'error_msg' = '--- Error: GET HOST BY NAME @ line %i'},
error_socker_462: {'error_msg' = '--- Error: SOCKET -462 @ line %i'},
error_hostname_count: {'error_msg' = '--- Error: MEMORY NOT RELEASED @ line %i'},
}
Here you define what errors can occur and what the error message should look like
def get_events(log_filehandle):
for line_no, line in enumerate(log_filehandle):
for error_code, error in errors.items():
if error_code in line:
yield line_no, error_code, line
This just takes a filehandle (can be a Stream or Buffer too) and just looks for error_codes in there, if it finds one, it yields it together with the line
def generate_report(report_filehandle, error_list):
error_counter = collections.Counter()
for line_no, error_code, error_line in error_list:
error_counter['generic'] += 1
error_counter[error_code] += 1
error_msg = format_error_msg(line_no, error_code)
report_file.write(error_msg)
report_file.write(error_line)
return error_counter
This loops over the found errors. It increases they counter, formats the message and writes it to the report_file
def format_error_msg(line_no, error_code):
return errors[error_code['error_msg'] % line_no
This uses string-formatting to generate a message from an error_code and line_no
with open(log_filename, 'r') as log_filehandle, open(report_filename, 'w') as report_filehandle:
error_list = get_events(log_filehandle):
error_counter = print_events(report_filehandle, error_list)
This ties it all together. You could use the error_counter
to add a summary to the report, or write a summary to another file or database.
This approach has the advantage that if your error recognition changes, you can do this independent of the reporting and vice-versa
Upvotes: 1
Reputation: 1157
Intro: the log that I want parse is coming from an embedded platform programmed in C.
I found into the embedded code, that somewhere there are a printf with \n\r instead of \r\n. I replace each \n\r with \r\n that correspond to windows CR LF.
With this change the python script works! And I can identify the error by its line.
Upvotes: 0