Reputation: 1157

Python get line number in file

I built a python (2.7) script that parses a txt file with this code:

cnt = 1

logFile = open( logFilePath, 'r' )

for line in logFile:
    if errorCodeGetHostName in line:
        errorHostNameCnt = errorHostNameCnt + 1
        errorGenericCnt = errorGenericCnt + 1
        reportFile.write( "--- Error: GET HOST BY NAME @ line " + str( cnt ) + "\n\r" )
        reportFile.write( line )


    elif errorCodeSocke462 in line:
        errorSocket462Cnt = errorSocket462Cnt + 1
        errorGenericCnt = errorGenericCnt + 1
        reportFile.write("--- Error: SOCKET -462 @ line " + str(cnt) + "\n\r" )
        reportFile.write(line)


    elif errorCodeMemory in line:
        errorMemoryCnt = errorMemoryCnt + 1
        errorGenericCnt = errorGenericCnt + 1
        reportFile.write("--- Error: MEMORY NOT RELEASED @ line " + str(cnt) + "\n\r" )
        reportFile.write(line)

    cnt = cnt + 1

I want to add the line number of each error, and for this purpose I added a counter (cnt) but its value is not related to to the real line number.

This is a piece of my log file:

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2017.06.13 17:05:43 =~=~=~=~=~=~=~=~=~=~=~=
UTC Time fetched from server #1: '0.pool.ntp.org'


   *** Test (cycle #1) starting...
   --- Test01 completed successfully!
   --- Test02 completed successfully!
   --- Test03 completed successfully!
   --- Test04 completed successfully!
   --- Test01 completed successfully!
   --- Test02 completed successfully!
INF:[CONFIGURATION] Completed
   --- Test03 completed successfully!
Firmware Version: 0.0.0


   *** Test (cycle #1) starting...

How can I get the real line number?

Thanks for the help.

Upvotes: 2

Answers (2)

Maarten Fabré

Reputation: 7058

apart from the line-ending issue, there are some other issues with this code

Filehandles

as remarked in on of the comments, it is best to open files with a with-statement

Separation of functions

Now you have 1 big loop where you both loop over the original file, parse it and immediately write to the ReportFile. I think it would be best to separate those.

Make one function to loop over the log, return the details you need, and a next function looping over these details and writing them to a report. this is a lot more robust, and easier to debug and test when something goes wrong

I would also let the IO as much outside as possible. If you later want to stream to a socket or something, this can be easily done

DRY

Lines 6 to 24 of your code contain a lot of lines that are almost the same, and if you want to add another error you want to report, you need to add another 5 lines of code, almost the same. I would use a dict and a for-loop to cut on the boilerplate-code

Pythonic

A smaller remark is that you don't use the handy things Python offers, like yield the with-statement, enumerate or collections.counter Also variable naming is not according to PEP-8, but that is mainly aesthetic

My attempt

errors = {
    error_hostname_count: {'error_msg' = '--- Error: GET HOST BY NAME @ line %i'},
    error_socker_462: {'error_msg' = '--- Error: SOCKET -462 @ line %i'},
    error_hostname_count: {'error_msg' = '--- Error: MEMORY NOT RELEASED @ line %i'},
    }

Here you define what errors can occur and what the error message should look like

def get_events(log_filehandle):
    for line_no, line in enumerate(log_filehandle):
        for error_code, error in errors.items():
            if error_code in line:
                yield line_no, error_code, line

This just takes a filehandle (can be a Stream or Buffer too) and just looks for error_codes in there, if it finds one, it yields it together with the line

def generate_report(report_filehandle, error_list):
    error_counter = collections.Counter()
    for line_no, error_code, error_line in error_list:
        error_counter['generic'] += 1
        error_counter[error_code] += 1

        error_msg = format_error_msg(line_no, error_code)
        report_file.write(error_msg)
        report_file.write(error_line)
    return error_counter

This loops over the found errors. It increases they counter, formats the message and writes it to the report_file

def format_error_msg(line_no, error_code):
    return errors[error_code['error_msg'] % line_no

This uses string-formatting to generate a message from an error_code and line_no

with open(log_filename, 'r') as log_filehandle, open(report_filename, 'w') as report_filehandle:
    error_list = get_events(log_filehandle):
    error_counter = print_events(report_filehandle, error_list)

This ties it all together. You could use the error_counter to add a summary to the report, or write a summary to another file or database.

This approach has the advantage that if your error recognition changes, you can do this independent of the reporting and vice-versa

Upvotes: 1

Federico

Reputation: 1157

Intro: the log that I want parse is coming from an embedded platform programmed in C.

I found into the embedded code, that somewhere there are a printf with \n\r instead of \r\n. I replace each \n\r with \r\n that correspond to windows CR LF.

With this change the python script works! And I can identify the error by its line.

Upvotes: 0