Leon
Leon

Reputation: 6059

How can I capture return value with Python timeit module?

Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this...

def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    return clf

when I call the function like this

t = Timer(lambda : RandomForest(trainX,trainy))
print t.timeit(number=1)

P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.

Upvotes: 42

Views: 21020

Answers (10)

Asher Stern
Asher Stern

Reputation: 2506

You can create a callable class that wraps your function, and captures its return value, like that:

class CaptureReturnValue:
    def __init__(self, func):
        self.func = func
        self.return_value = None

    def __call__(self, *args, **kwargs):
        self.return_value = self.func(*args, **kwargs)

Then call timeit like that:

    crv = CaptureReturnValue(f1)
    elapsed_time = timeit.timeit(lambda: crv(your_parameters), number=1, globals=globals())
    print(crv.return_value)
    print(elapsed_time)

Note that the timing overhead is a little larger in this case because of the extra function calls.

Upvotes: 2

user15972
user15972

Reputation: 124

The original question wanted allowance for multiple results, multithreading, and multiprocessing. For all those, a queue will do the trick.

# put the result to the queue inside the function, via globally named qname
def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    global resultq
    resultq.put(clf)
    return clf

# put the result to the queue inside the function, to a queue parameter
def RandomForest(train_input, train_output,resultq):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    resultq.put(clf)
    return clf

# put the result to the queue outside the function
def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    return clf


#usage:
#     global resultq
#     t=RandomForest(train_input, train_output)
#     resultq.put(t)

# in a timeit usage, add an import for the resultq into the setup.
setup="""
from __main__ import resultq
"""

# # in __main__  # #

#  for multiprocessing and/or mulithreading
import multiprocessing as mp
global resultq=mp.Queue() # The global keyword is unnecessary if in __main__ ' Doesn't hurt

# Alternatively, 

# for multithreading only
import queue
global resultq=queue.Queue() # The global keyword is unnecessary if in __main__ ' Doesn't hurt

#   do processing

# eventually, drain the queue

while not resultq.empty():
  aclf=resultq.get()
  print(aclf)

Upvotes: 0

Antony Hatchkins
Antony Hatchkins

Reputation: 33974

As of 2020, in ipython or jupyter notebook it is

t = %timeit -n1 -r1 -o RandomForest(trainX, trainy)
t.best

Upvotes: 6

Jerzy
Jerzy

Reputation: 760

If you don't want to monkey-patch timeit, you could try using a global list, as below. This will also work in python 2.7, which doesn't have globals argument in timeit():

from timeit import timeit
import time

# Function to time - plaigiarised from answer above :-)
def foo():
    time.sleep(1)
    return 42

result = []
print timeit('result.append(foo())', setup='from __main__ import result, foo', number=1)
print result[0]

will print the time and then the result.

Upvotes: 2

Andrii Marusiak
Andrii Marusiak

Reputation: 31

For Python 3.X I use this approach:

# Redefining default Timer template to make 'timeit' return
#     test's execution timing and the function return value
new_template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        ret_val = {stmt}
    _t1 = _timer()
    return _t1 - _t0, ret_val
"""
timeit.template = new_template

Upvotes: -1

Xavier
Xavier

Reputation: 161

If I understand it well, after python 3.5 you can define globals at each Timer instance without having to define them in your block of code. I am not sure if it would have the same issues with parallelization.

My approach would be something like:

clf = ensemble.RandomForestClassifier(n_estimators=10)
myGlobals = globals()
myGlobals.update({'clf'=clf})
t = Timer(stmt='clf.fit(trainX,trainy)', globals=myGlobals)
print(t.timeit(number=1))
print(clf)

Upvotes: 3

ereynrs
ereynrs

Reputation: 1

An approach I'm using it is to "append" the running time to the results of the timed function. So, I write a very simple decorator using the "time" module:

def timed(func):
    def func_wrapper(*args, **kwargs):
        import time
        s = time.clock()
        result = func(*args, **kwargs)
        e = time.clock()
        return result + (e-s,)
    return func_wrapper

And then I use the decorator for the function I want to time.

Upvotes: 0

Brendan Cody-Kenny
Brendan Cody-Kenny

Reputation: 454

For Python 3.5 you can override the value of timeit.template

timeit.template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        retval = {stmt}
    _t1 = _timer()
    return _t1 - _t0, retval
"""

unutbu's answer works for python 3.4 but not 3.5 as the _template_func function appears to have been removed in 3.5

Upvotes: 25

Hugh Perkins
Hugh Perkins

Reputation: 8572

Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-)

I solved it as follows, by writing a function, that:

  • runs your function
  • prints the running time, along with the name of your function
  • returns the results

Let's say you want to time:

clf = RandomForest(train_input, train_output)

Then do:

clf = time_fn( RandomForest, train_input, train_output )

Stdout will show something like:

mymodule.RandomForest: 0.421609s

Code for time_fn:

import time

def time_fn( fn, *args, **kwargs ):
    start = time.clock()
    results = fn( *args, **kwargs )
    end = time.clock()
    fn_name = fn.__module__ + "." + fn.__name__
    print fn_name + ": " + str(end-start) + "s"
    return results

Upvotes: 8

unutbu
unutbu

Reputation: 879113

The problem boils down to timeit._template_func not returning the function's return value:

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            _func()
        _t1 = _timer()
        return _t1 - _t0
    return inner

We can bend timeit to our will with a bit of monkey-patching:

import timeit
import time

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            retval = _func()
        _t1 = _timer()
        return _t1 - _t0, retval
    return inner

timeit._template_func = _template_func

def foo():
    time.sleep(1)
    return 42

t = timeit.Timer(foo)
print(t.timeit(number=1))

returns

(1.0010340213775635, 42)

The first value is the timeit result (in seconds), the second value is the function's return value.

Note that the monkey-patch above only affects the behavior of timeit when a callable is passed timeit.Timer. If you pass a string statement, then you'd have to (similarly) monkey-patch the timeit.template string.

Upvotes: 18

Related Questions