Dr. Chocolate
Dr. Chocolate

Reputation: 2165

Benchmarking run times in Python

I have to benchmark JSON serialization time and compare it to thrift and Google's protocol buffer's serialization time. Also it has to be in Python.

I was planning on using the Python profilers. http://docs.python.org/2/library/profile.html

Would the profiler be the best way to find function runtimes? Or would outputting a timestamp before and after the function call be the better option?

Or is there an even better way?

Upvotes: 3

Views: 7664

Answers (2)

abarnert
abarnert

Reputation: 366073

From the profile docs that you linked to:

Note The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is timeit for reasonably accurate results). This particularly applies to benchmarking Python code against C code: the profilers introduce overhead for Python code, but not for C-level functions, and so the C code would seem faster than any Python one.

So, no, you do not want to use profile to benchmark your code. What you want to use profile for is to figure out why your code is too slow, after you already know that it is.

And you do not want to output a timestamp before and after the function call, either. There are just way too many things you can get wrong that way if you're not careful (using the wrong timestamp function, letting the GC run a cycle collection in the middle of your test run, including test overhead in the loop timing, etc.), and timeit takes care of all of that for you.

Something like this is a common way to benchmark things:

for impl in 'mycode', 'googlecode', 'thriftcode':
    t = timeit.timeit('serialize(data)', 
                      setup='''from {} import serialize; 
                               with open('data.txt') as f: data=f.read()
                            '''.format(impl),
                      number=10000)
    print('{}: {}'.format(impl, t)

(I'm assuming here that you can write three modules that wrap the three different serialization tools in the same API, a single serialize function that takes a string and does something or other with it. Obviously there are different ways to organize things.)

Upvotes: 5

DanGar
DanGar

Reputation: 3078

You should be careful when you are profiling python code based on a time stamp at the start and end of the problem. This does not take into account other processes that might also be running concurrently.

Instead, you should consider looking at

Is there any simple way to benchmark python script?

Upvotes: 2

Related Questions