Karim Agha
Karim Agha

Reputation: 3625

Data structure for efficient function call matching

I'm building a tool that among other things has to measure performance-related impact of changes in our product.

To get that done, I've implemented a profiler that traces whenever a function is called or returns and notifies me about that. First, I've dumped the output to a file to get a sense of the data I'll be working with and here is more-or-less how they look like:

FuncCall1
   FuncCall2
      FuncCall3
      FuncRet3
      FuncCall4
      FuncRet4
      FuncCall5
        FuncCall6
        FuncRet6
      FuncRet5
    FuncRet2
FuncRet1

To have a better visual understanding of how this data looks like, here is a graph of the first 10000 function calls: (x-axis: time, y-axis: depth/nesting): Function Call/Return graph (http://img444.imageshack.us/img444/4710/proflog.gif)

When a function begins execution, I will log it's name/identifier and the current high-precision timestamp and when it returns I will need to lookup the entry where I stored the start time and add a new timestamp that marks it return.

To sum things up, the operations that I'm going to perform on these data are:

  1. Insert a new function call mark with current timestamp.
  2. Lookup the most recent function call of a certain ID and store the return timestamp.
  3. See which other functions were called from within a certain function (and see where its spending its time) - for example If I'm looking at Function#2 in the previous example, I want to know that it calls Function#3, Function#4, Function#5 and Function#5 calls Function#6 then it returns (with all call/return timestamps marked).

Now, I have several ideas for data-structures that might be good for this scenario:

  1. An auto-balanced tree (i.e AVL) where the key for each node will be the function identifier and the value in each node would be a stack of timestamp pairs. This approach will give me fast insertion and lookup when marking function timestamps and the fact that each node is a stack, it will also take care of matching the right return timestamp to the start timestamp - Always (I assume) the latest return timestamp of a certain function should match the most nested/recent function call. In this approach, maintaining nested function calls with different identifiers would be a bit troublesome, because I will have to traverse the tree and match them basing on their timestamp to figure out their nesting - not ideal.

  2. Maintain a list of functions that did not return yet (that will preserve the call-stack info) and use skip-list where each level would be equal to function-call-nesting level. This approach would make operation #3 easier, but lookups will be slower and I might have to maintain very long lists of not returned functions - such as main(), that will have to be maintained throughout the lifetime of my application. Here I could also use a hashtable, to improve the lookup speed sacrificing some more memory usage. Memory usage is critical - this profiler easily generates about 20 MB / s.

The reason why I'm not using a plain-simple stack to track these data, is because I will need to periodically sync partial results to a different machines and have at least, partial-results available before everything returns.

I've looked over interval-trees, range-trees and other kind of data-structures that I'm aware of, but I can't find any that would meet all my 3 requirements efficiently.

Maybe there is a data structure that would meet them all that I'm not aware of? Any Ideas?

Update:

What about this:

Having a tree that would have function calls along with their nested calls and a separate stack for the functions that did not return.

Now every element on the stack will have a pointer to it's copy in the tree and when a new function call arrives, I will look at the top element on the stack, trace it's pointer to it's representation in the tree, add the new function call as a child to that call and push it's copy on the stack with a pointer to the newly created tree node.

For function returns, it's similar, for every function return, the latest entry on the stack will always be it's call - trace the call pointer, save the return time in the tree and pop the call.

Do you see any major flaws in my thinking?

Update 2:

My Approach worked perfectly. I will wait 2 days and answer my question.

Upvotes: 4

Views: 1878

Answers (3)

Matthieu M.
Matthieu M.

Reputation: 300349

The idea of a tree seems... wasteful.

What you are doing requires a simple stack.

  • Log the ID/entry timestamp pair
  • Execute child calls
  • Modify top-of-stack entry with exit timestamp

Also, all child calls are actually children of the same parent...

In your position, I would simply use:

  • RAII for automatic exit-timestamp logging
  • A deque-like structure for fast record append

The record structure would be like:

struct Record {
    unsigned level;
    unsigned id;
    unsigned entry;
    unsigned exit;
};

Then, you keep two thread-local structures:

extern thread_local unsigned CurrentLevel;
extern thread_local std::deque<Record> CallRecords;

And finally, you implement a simple RAII class:

class CallRecorder: boost::noncopyable() {
public:
    CallRecord(unsigned id):
        record(*CallRecords.insert(CallRecords.end(),
                                   Record{CurrentLevel++, id, time(), 0}))
    {
    }

    ~CallRecord() { record.exit = time(); --CurrentLevel; }

private:
    Record& record;
};

You could, potentially, pass down the ID of the parent to each child call, but it does not seem worth it. This is information you can reconstruct when exploiting the stream (keeping a separate stack on the side).

I only have two notes:

  • You might want to create your own implementation of deque to allocate bigger chunks (4K page for examples) and really optimize for append above all.
  • Using RAII, we deal with both regular returns and exceptions. Only abort are not logged, since the program terminates. It can be detected as it leaves an incomplete trace.

Upvotes: -1

Gold
Gold

Reputation: 134

You can use a trace class. The drawback: you have to declare an instance of this tracer in the very begining of each function you have to log/mesure. It also adds significant amount of cycles to your mesures, so only huge functions can be properly traced with this method.

Another way to achieve that is to write a real profiler but such a tool already exists ( gprof ), and write a parser for it. Still you can write your own... a realy long task.

I recommand you testing each function or group separately, in unit tests, this is the way we usualy do it efficiently. Then pops up a profiler and try to optimize the 10% code you are running 90% of the time. You are focusing on the small details by looking them far away.

Here is an outpu of one of my tool:

Fri Jul 09 00:49:12 2010 - [ 3799946229640 ] - D:\Code Project\include/design/BTL/algorithm/dispatch_policy.h::operator() # |...operator() ( ) {

Fri Jul 09 00:49:12 2010 - [ 3799946246830 ] - D:\Code Project\include/design/BTL/algorithm/dispatch_policy.h::operator() # |...shape *, shape *

Fri Jul 09 00:49:12 2010 - [ 3799946265738 ] - D:\Code Project\include/design/BTL/algorithm/dispatch_policy.h::operator() # |...} operator() : 46027 cpu_cycles

As you stated above the output is huge making it impractical for deep analisys, you cannot neither monitor a program for too long due to the 20Mb/s output stream. It's only usefull when you already know where to investigate. Another concern du to the théorical bandwidth required by such a tool, you HAVE TO use bufferred ostreams ... making the tool all the more intrusive to the real software. I've already experienced a x10 slowdown !

Upvotes: 1

orlp
orlp

Reputation: 117846

From the perspective of one thread I think the most efficient is to have a serious intrusive data structure - you combine the call stack and the AVL-tree, something like this:

// one of these per call
struct {
    function *func; // func in the tree (or ID)
    timestamp time; // timestamp of call
    call *prev_call; // previous function call
    call *next_call; // next function call
} call;

// one of these per function
struct {
    call *last_call; // last call of this function
    your_type id; // identifier

    // insert tree-specifics here
} function;

I haven't fully worked this out but I think this is the way to go.

Upvotes: 1

Related Questions