Transact Charlie
Transact Charlie

Reputation: 2203

How to trace execution of an existing python twisted application

So... probably many of you will recognise this from your professional lives.

You've inherited a code base of hundreds of files, thousands of lines each with a mix of objects, closures and globals, with no unit tests, and lots of it may actually be dead code that is never called any more. One guy (who is a super smart guy but a complete hacker) wrote most (like all of it) of it and he's not around any more. Interesting quirks of python abound which do desired things but which don't have comments to indicate why they are there.

d = dict(d)

where d was a dictionary already (passed into the method), is my personal favorite so far... (shallow copy for a reason - took me a while to work that out) Generic methods that are used by different paths through the system and know what they need to do by

if <passed_in_param>["XXX"] = <something>:
    <pathA>
    return A
elif <passed_in_param>["ZZZ"] = <somethingElse>:
    <pathB>
    return B

which is... eugh.. etc. etc. etc - just the usual stuff that itches to be refactored.

My real problem is that I need to get an understanding of how the application works before I can really start debugging it...... again - familiar territory for many I'm sure.

The system uses twisted to host a bunch of http endpoints (using twistd) and there are loads of endpoints occupying various roots. Each of which does 'interesting' things but essentially they all communicate with various databases or third party rest endpoints and then aggregate something format something and return a rest response. Twisted is a really good framework for that!

Now, is there any way that I can ask the twisted framework to help me start debugging this? something as simple as setting some flag so that whenever a callback is fired - it'll print the method and arguments passed to the callback? This isn't something I've been able to find in the documentation.

What do you guys do? Any tips along with answers would be welcomed.

I'm not above monkey patching twisted on my local development box if I had too.

Update - some hacky progress............ Not being above some hackyness myself, I've come up with something a little better than nothing.

In the tap.py file the twistd launcher launches I've put this (in case anyone else finds this useful):

trace_exclusions = ["twisted", "DQL2SQL", "DDBClient", "exon_parser", "ontology", "pglib"]

def tracefunc(frame, event, arg, indent=[0]):
    if frame.f_code.co_filename.startswith("<!redacted!>/src/"):
        if not any(te in frame.f_code.co_filename for te in trace_exclusions):
            if event == "call":
                indent[0] += 2
                print "-" * indent[0], frame.f_code.co_filename.replace("/<!redacted!>/src/", ""), frame.f_code.co_name, frame.f_code.co_varnames
            if event == "return":
                indent[0] -= 2
    return tracefunc

import sys
sys.settrace(tracefunc)

and I'm getting somewhat useful output:

-- servers/http/base.py getResourceFor ('self', 'request', 'path', 'full_path', 'channel', 'category', 'service', 'version', 'market', 'locale', 'currency', 'query_id', 'extra_id')
-- servers/carhire_api/carhire_api.py render_GET ('self', 'request', 'entity_id', 'distance', 'dql', 'd', 'response')
---- servers/carhire_api/carhire_api.py get_int_parameter ('request', 'param_key')
---- servers/carhire_api/carhire_api.py get_int_parameter ('request', 'param_key')
---- beautify/deferred_entity_beautify.py get_entity_id ('pkid', 'val', 'iface', 'd')
------ cache/apc_cache.py __getitem__ ('self', 'key')
------ beautify/deferred_entity_beautify.py get_interface_id ()
------ kraken/client.py get ('self', 'key', 'd')
-------- kraken/client.py _get_multi ('self', 'connection', 'keys', 'task', 'd')
---------- kraken/twist.py get ('self', 'namespace', 'keys', 'task', 'the_size', 'req', 'r', 'the_timeout', 'l', 'idx', 'tmp_keys', 'd_l', 'exc')
------------ kraken/__init__.py get ('self', 'namespace', 'keys', 'req')
-------------- kraken/__init__.py request ('self', 'method', 'namespace', 'data', 'req')
---------------- kraken/__init__.py <genexpr> ('.0', 'x')
---------------- kraken/__init__.py <genexpr> ('.0', 'x')

But if anyone has anything better - I'd love to know!

Upvotes: 3

Views: 932

Answers (2)

Transact Charlie
Transact Charlie

Reputation: 2203

Adding this in as the original post is already long enough. In addition to the spewer that Jean-Paul noted - I've written a hack in the tap.py that launches the application. I'll post it here in case it may be of any use to anyone in the future. It does give me the information I need - which is what happens when I hit an endpoint... Obviously something that tied this back to the original deferred chain would be rather more awesome. I took some of this code from the links Jean-Paul posted - thanks.

trace_exclusions = [
    "twisted",
    "DQL2SQL",
    "DDBClient",
    "exon_parser",
    "ontology",
    "pglib",
    "servers/interfaces/counter.py",
    "event_feed/feed_register.py",
    "database_notification_manager",
    "revenue_optimization"
    ]

def extractArgs(frame):
    co = frame.f_code
    dict = frame.f_locals
    n = co.co_argcount
    if co.co_flags & 4: n = n+1
    if co.co_flags & 8: n = n+1
    result = {}
    for i in range(n):
        name = co.co_varnames[i]
        if name != "self":
            r = dict.get(name, "----")
            r = str(r)
            if len(r) > 30:
                r = r[0:28] + '..'
            result[name] = r 
    return result

def tracefunc(frame, event, arg, indent=[0]):
    if frame.f_code.co_filename.startswith("/home/developer/projects/befogg/servers/nlp/src/"):
        if not any(te in frame.f_code.co_filename for te in trace_exclusions):
            if event == "call":
                indent[0] += 1
                print "-" * indent[0], frame.f_code.co_filename.replace("/home/developer/projects/befogg/servers/nlp/src/", ""), frame.f_code.co_name, "VARS:", frame.f_code.co_varnames
                print "-" * indent[0], frame.f_code.co_filename.replace("/home/developer/projects/befogg/servers/nlp/src/", ""), frame.f_code.co_name, "ARGS:", extractArgs(frame)
            if event == "return":
                indent[0] -= 1
    return tracefunc

import sys
sys.settrace(tracefunc)

Upvotes: 0

Jean-Paul Calderone
Jean-Paul Calderone

Reputation: 48335

Some work has been started on tools to make use of Deferred in particular easier to debug. There is some information on https://twistedmatrix.com/trac/ticket/1402 and some code in a branch referenced from https://twistedmatrix.com/trac/ticket/4304. These tools aren't actually part of Twisted yet so it's a little inconvenient to try to use them. At worst perhaps they'll give you some ideas, and maybe you'll actually find a way to use them on your application.

Similar to the trace-hook based tracer you've already written, there's twisted.python.util.spewer and epsilon.spewer.Spewer. These may not actually provide any more information than the tool you've already built, though.

Upvotes: 3

Related Questions