Reputation: 3555
everyone! I am trying to test the creation performance of dictionary with objects, but I get some weird results. I used three different methods to measure the time to create lots of dictionary in Python. The first solution is time module. I know it is not accurate. The test file is "node_time.py"
from __future__ import print_function
from time import time
class node(object):
def __init__(self, key, value):
self.key = key
self.value = value
self.right = None
self.left = None
self.parent = None
self.depth = 0
return
begin = time()
content = [node(i,i) for i in range(1000000)]
print(time()-begin)
The second method is timeit module. It should be a much better choice. The test file is "node_timeit.py"
from __future__ import print_function
from timeit import repeat
class node(object):
def __init__(self, key, value):
self.key = key
self.value = value
self.right = None
self.left = None
self.parent = None
self.depth = 0
return
cmd = "content = [node(i,i) for i in range(1000000)]"
prepare = "from __main__ import node"
cost = min(repeat(cmd, prepare, repeat=1, number =1))
print(cost)
The third method is to use the system command "time" in Linux. The test file is "node_sys.py"
from __future__ import print_function
class node(object):
def __init__(self, key, value):
self.key = key
self.value = value
self.right = None
self.left = None
self.parent = None
self.depth = 0
return
content = [node(i,i) for i in range(1000000)]
Finally the result is quite different.
-bash-4.2$ python2 node_time.py
5.93654894829
-bash-4.2$ python2 node_timeit.py
2.6723048687
-bash-4.2$ time python2 node_sys.py
real 0m8.587s
user 0m7.344s
sys 0m0.716s
The result with time module method (measure the wall-clock time) should be greater than the currect value. But with Linux command "time", the sum of user CPU time and sys CPU time would be as much as 8.060 s. Which result is the correct one? And why they are so much different? Thanks for any comment!
Upvotes: 0
Views: 627
Reputation: 281624
The difference between the time
and timeit
timings is because
By default,
timeit()
temporarily turns off garbage collection during the timing.
When you allocate a lot of memory, normally the cyclic garbage collector will kick in to see if it can reclaim some of that. To get more consistent timings, timeit
disables this behavior for the duration of the timing.
Compare the timings with time
, with and without garbage collection:
>>> def t1():
... s = time.time()
... content = [node(i, i) for i in range(1000000)]
... print time.time() - s
...
>>> t1()
3.27300000191
>>> gc.disable()
>>> t1()
1.92200016975
to the timings with timeit
, with and without garbage collection:
>>> gc.enable()
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node; import gc; gc.enable()', number=1)
3.2806941528164373
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node', number=1)
1.8655694847876134
As you can see, both methods produce the same timing with the same GC settings.
As for the command line time
command, that includes the entire runtime of the program, including interpreter setup and teardown and other parts the other timings don't include. I suspect one of the big contributors to the difference is the time taken to free all the node
objects you allocated:
>>> def t2():
... s = time.time()
... [node(i, i) for i in range(1000000)]
... # List and contents are deallocated
... print time.time() - s
...
>>> t2()
3.96099996567
Upvotes: 4