user176168
user176168

Reputation: 1370

C++ hooking my own programs functions

So I'm wanting to profile my app and I specifically want to record the time from program start of when each of the functions called inside the program (ingnoring functions in DLL's) are entered and exited ie I want a simple table which looks something like this:

THREAD_ID FUNCTION_ADDRESS TIME EVENT_TYPE
5520      0xFF435360       0    ENTERED
5520      0xFF435ED3       25   ENTERED
5520      0xFF433550       40   ENTERED
5520      0xFF433550       50   EXITED
5520      0xFF433550       60   ENTERED
5520      0xFF433550       70   EXITED
5520      0xFF435ED3       82   EXITED
5520      0xFF435360       90   EXITED

For a program looking like this ignoring compiler optimisation:

void test1(void)
{
   int a = 0;
   ++a;
}

void test(void)
{
    test1();
    test1();
}

void main(void)
{
    test();
}

I couldn't find any off the shelf solution to this the nearest I could find was Microsofts VSPerfReport but it just outputs how long was spent in each function not when entered and exited.

So I started to looking into hooking all my functions with a simple function that produces a buffer which I can generate the above table from. In order to do this I was just thinking of creating a function that is called at the start of main that can go through the entire exe modify the CALL instructions to call into my hook function instead.

The libraries out there like MinHook etc all seem a little OTT for me and probably wouldn't work because its a x64 app and I'm not trying to hook DLL functions.

So I was thinking of just modifying the JMP instruction inside each of the CALL instructions ie this program:

void main(void)
{
...asm prologue 
    test();
002375C9  call        test (235037h) 
}
...asm epilogue

The call here goes to a table of JMP's:

@ILT+40(__set_errno):
0023502D  jmp         _set_errno (243D80h)  
@ILT+45(___crtGetEnvironmentStringsA):
00235032  jmp         __crtGetEnvironmentStringsA (239B10h)  
test:
00235037  jmp         test (237170h)  
@ILT+55(_wcstoul):
0023503C  jmp         wcstoul (27C5D0h)  
@ILT+60(__vsnprintf_s_l):

I want to go through this table and re-route all the JMP's relating to functions in my application's .exe to my hook functions that contain the timing code and then return back to the calling function.

So what does ILT stand for I'm assuming something Lookup Table and how would I go about getting hold of it?

Is this possible I've heard of IAT hooking but that looks to me to be only when hooking DLL's. Also here I've ignored exiting although another JMP in place of the RET instruction might help there?

Thanks for any help

Upvotes: 4

Views: 2030

Answers (4)

amdn
amdn

Reputation: 11582

gcc has an option to generate calls to hooks for function entry and exit. You compile with -finstrument-functions and the compiler generates calls to __cyg_profile_func_enter and __cyg_profile_func_exit. You can read more in the gcc documentation http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html. A good article with an example of how to use this is here http://www.ibm.com/developerworks/library/l-graphvis/.

Upvotes: 1

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275660

struct my_time_t;
my_time_t get_current_time(); // may be asm


struct timestamp;
struct timer_buffer {
  std::unique_ptr<timestamp[]> big_buffer;
  size_t buffer_size;
  size_t current_index;
  size_t written;
  buffer( size_t size ): big_buffer( new timestamp[size] ), buffer_size(size), current_index(0), written(0) {}
  void append( timestamp const& t ) {
    big_buffer[current_index] = t;
    ++current_index;
    ++written;
    current_index = current_index % buffer_size;
  }
};
struct timestamp {
  static timer_buffer* buff;
  timestamp const* loc;
  my_time_t time;
  const char* filename;
  size_t linenum;
  timestamp( my_time_t t, const char* f=nullptr, size_t l = 0 ):
    loc(this), time(t), f(filename), l(linenum)
  {
    go();
  }
  void go() {
    buff->append(*this);
  }
};
struct scoped_timestamp:timestamp {
  scoped_timestamp( my_time_t t, const char* f=nullptr, size_t l = 0 ):
    timestamp(t, f, l)
  {}
  ~scoped_timestamp() {
    go();
  }
};
#define TIMESTAMP_SCOPE( NAME ) scoped_timestamp NAME(get_current_time(), __FILE__, __LINE__);
#define TIMESTAMP_SPOT() do{timestamp _(get_current_time(), __FILE__, __LINE__);}while(false)

Create timestamp::buff somewhere. Make the buff big enough. Write a fast an efficient get_current_time().

Insert TIMESTAMP_SCOPE(_) at the start of functions you think you have problems with.

Insert TIMESTAMP_SPOT(); at between locations that you think will take time.

Add some post-processing of the timer_buffer before you shut down -- write it out to disk, or whatever. Keep an eye on if written > current_index, in which case you wrapped the buffer. Note that none of the above code contains any branches, so it should be relatively performance friendly (other than constantly moving the array owned by buff into the cache).

The loc exists so you can relatively easily find create/destroy pairs (as its binary value tracks the value of the stack!), so you can analyze the buffer after the fact for which function calls took too long. Throwing together a visualizer isn't all that hard, and I've seen something much like the above done for detecting ms level timing glitches and hiccups in video streaming driver code.

Start the analysis at current_index and work backwards, looking for pairs, until you hit 0 (or, if written != current_index, until you wrap around to current_index+1). It shouldn't be hard to recover a call graph (if you want).

Stripping out much of the above, and simply using unique tags for each timestamp, can reduce the size if your buffer, but make it a bit harder to reconstruct your call graph.

Yes, this isn't auto-instrumentation. But the part of your code that is behaving slowly is going to be a relatively small part of it. So start instrumenting with something like the above, and I'm guessing you'll get your answer faster than messing around with disassembling the binary output of your compiler and messing around with jump tables.

Upvotes: 0

Necrolis
Necrolis

Reputation: 26171

Have you looked into Googles profiling tools? You might find it a little easier to modify instead of crafting your own. It does do code insertion to perform its profiling, so it minimum, their injection framework would be beneficial to you.

However, for something like this, you mostly want to avoid timing overhead, so I'd suggest tracking by address, then when the profiling is completed, transform the address to the symbol names. The hooking itself can also be an arduous task, I'd suggest crafting an all-in-one wrapper, that doesn't alter the function entries or exits, but rather redirects call sites.

So what does ILT stand for I'm assuming something Lookup Table and how would I go about getting hold of it?

Import Lookup Table, and its not going to be of much use if you plan on profiling internal functions as well. Getting hold of it requires spelunking the internals of your platforms module format (PE, ELF, MACH-O).

Upvotes: 2

vonbrand
vonbrand

Reputation: 11791

On Linux you can use gprof(1) to get that data. But please take what Bentley in his "Programming Pearls" says on performance. It's part II is a distillation of his "Writing Efficient Programs" (sadly out of print), a very detailed discussion of how (and, much more importantly, when) to optimize code.

Upvotes: 0

Related Questions