Expected2BlankLines
Expected2BlankLines

Reputation: 55

Negative process execution time in Python. How do I correctly measure this?

So I'm doing an assignment where I need to write two programs which use threading \ multiprocessing. While the threading part of it went smooth, I'm having some trouble with measuring the execution time of a process. Here's the code:

import multiprocessing as mp
import time 

start = time.perf_counter()
txtsmpl = open('C:\\dev\\OS\\dummy.txt', 'r').read()
processes_num = 5
to_replace = 'az'
replace_with = '{[]}'


def process_txt(start_time, inp_text):
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.perf_counter() - start_time, 4)) + ' second(s)...\n')

def main():
    processes = []
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [time.perf_counter(), txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.perf_counter()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

This is the recommended method of measureing it (passing start time as argument to thread\processss). But with processes, I get negative execution times. Of coursse I undersstand how it can happen when they all run in parallel, but I don't know how to prevent it. I appreciate any help!

Upvotes: 0

Views: 458

Answers (2)

max wiklund
max wiklund

Reputation: 113

If you want to profile python code you can use cProfile.

import cProfile

def _a():
    a = 1
    while a < 1000:
        a+= 1

def _b():
    a = 1
    while a < 100:
        a+= 1

cProfile.run("_a()")
cProfile.run("_b()")

Output:

         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:4(_a)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:9(_b)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



Process finished with exit code 0

Upvotes: 0

Booboo
Booboo

Reputation: 44213

First, when you run under Windows you note that you must protect your code that creates sub-processes by ensuring that it runs within a block that is controlled by a if __name__ == '__main__':, as you have correctly done. The reason for that is when the sub-process is executes the entire file is re-executed and you would get into an an infinite, recursive loop of creating new sub-processes. But the point here is that you have code not controlled by the if __name__ == '__main__': block that you really do not need nor want to be executed by each sub-process, i.e. the opening and reading of the file. This code should be moved.

Now for your mystery. According to the manual on time.perf_counter():

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

Pay attention to the last sentence above. There being calls made to this in parallel from multiple processes. You think that your calls to this function that assign values to the starting time and ending time in computing the elapsed time are successive calls, but there is no guarantee that they are. You really need to just use time.time():

import multiprocessing as mp
import time

to_replace = 'az'
replace_with = '{[]}'


def process_txt(start_time, inp_text):
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.time() - start_time, 4)) + ' second(s)...\n')

def main():
    start = time.time()
    txtsmpl = open('C:\\dev\\OS\\dummy.txt', 'r').read()
    processes_num = 5
    processes = []
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [time.time(), txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.time()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

Update

To get a better idea of the time it takes for each sub-process to process the text, process_txt should initialize start_text with its own call to time.time() and to get a better idea of how much time you are saving by using sub-processes, move the assignment start = time.start_time() in the main process right before where you create the sub-processes:

import multiprocessing as mp
import time

to_replace = 'az'
replace_with = '{[]}'


def process_txt(inp_text):
    start_time = time.time()
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.time() - start_time, 4)) + ' second(s)...\n')

def main():
    txtsmpl = open('C:\\dev\\OS\\dummy.txt', 'r').read()
    processes_num = 5
    processes = []
    start = time.time()
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.time()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

You will probably find that the overhead of creating sub-processes for such short-running tasks far outweighs any benefits of using parallelism. This code would run much faster by just calling process_txt 5 times in a loop without creating sub-processes.

Upvotes: 1

Related Questions