Reputation: 27855

Hunt for root cause of flaky test in Python

There is a flaky test, and we have no clue what the root cause could be.

with pytest.raises(foo.geocoder.GeocodingFailed):
    foo.geocoder.geocode('not a valid place')

Sometimes the exception does not happen.

I looked at the docs how to handle failures, but this did not help.

How to trace geocode() so that if the exception does not happen, I see a trace?

I looked at the trace module of the standard library. But there seems to be no easy way to get the trace as string.

With "a trace" I mean: A trace of all lines which got executed during gecode(). I would like to see method calls and returns statements with indentation. I want to ignore lines from the Python standard library.

AFAIK a debugger like pdb does not help here, since the test only fails when it gets run in CI, and only once or twice per month.

Upvotes: 9

Answers (3)

Nizam Mohamed

Reputation: 9230

The trace module gives programmatic access via trace.Trace class.
On test failures the Trace class' console output is visible.
And it has coverage report to write on chosen path.

I exclude sys.base_exec_prefix, sys.base_prefix to not trace Python library modules.

import pytest


def f(x):
    import random

    choice = random.choice((True, False))

    if choice:
        raise ValueError
    else:
        return x * 2


def trace(f, *args, **kwargs):
    import trace
    import sys

    tracer = trace.Trace(
        ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
    )
    ret = tracer.runfunc(f, *args, **kwargs)
    r = tracer.results()
    r.write_results(coverdir="/tmp/xx-trace")
    return ret


def test_f():
    with pytest.raises(ValueError):
        trace(f, 3)

Coverage report;
Lines marked with >>>>>> were not executed i.e. not traced and the numbers with colon are execution counts.

>>>>>> import pytest
       
       
>>>>>> def f(x):
    1:     import random
       
    1:     choice = random.choice((True, False))
       
    1:     if choice:
>>>>>>         raise ValueError
           else:
    1:         return x * 2
       
       
>>>>>> def trace(f, *args, **kwargs):
>>>>>>     import trace
>>>>>>     import sys
       
>>>>>>     tracer = trace.Trace(
>>>>>>         ignoredirs=(sys.base_exec_prefix, sys.base_prefix), timing=True,
           )
>>>>>>     tracer.runfunc(f, *args, **kwargs)
>>>>>>     r = tracer.results()
>>>>>>     r.write_results(coverdir="/tmp/xx-trace")
       
       
>>>>>> def test_f():
>>>>>>     with pytest.raises(ValueError):
>>>>>>         trace(f, 3)

Upvotes: 4

Mark

Reputation: 1983

Have you tried using pytest-repeat with --pdb.

Something like pytest test_geocode.py --count 10 --pdb. This could work if your CI works in a docker container and you can connect to the container.

Alternatively, Jenkins allows you to pause on failure and then investigate that way, but since it's flaky and only happens once or twice a month, then literally adding the --pdb flag to be part of the CI so that you can connect whenever it happens. You would need a way to notify you that it has entered this flow though otherwise the build may time out. But if you keep a close eye on your builds it may not be an issue.

Additionally, if you're not already using pytest-html it may be worth integrating to get a nice report of what went on (via log lines) and what failed or just using

pytest test_geocode.py --log-file=/path/to/log/file --log-file-level=DEBUG.

Attaching either the pytest html report and or the log file as artefacts to failed builds will allow for better debugging.

Upvotes: 0

Stefan Sonnenberg-Carstens

Reputation: 515

The trace library does not help, as it does not enable writing to a string or a StringIO object, it only writes to real files.

What you can do is to use sys.settrace() and define a simple function, which is called for every execution. You will find the documentation here: https://docs.python.org/3/library/sys.html#sys.settrace. The basic magic is to get the details out of the frameobject, which is documented here: https://docs.python.org/3/library/inspect.html.

A sample to give you an idea looks like this:

import sys


def tracer(frame, event, arg):
    print(frame.f_code.co_name, frame.f_code.co_filename, frame.f_lineno, event, arg)


def bad_function(param: int):
    if param == 20:
        raise RuntimeError(f'that failed for {param}')


sys.settrace(tracer)

bad_function(1)
bad_function(20)
bad_function(2)

It should be easy to store that information into a string for further investigation, or to handle the exception, if it is raised.

Upvotes: 3

Hunt for root cause of flaky test in Python

Answers (3)

Related Questions