YoungPadawan
YoungPadawan

Reputation: 31

Analysing assembly code performance

I'm new to stack overflow and hope to get some advice on how to approach the problem I'm having. Having little assembly experience I am having a difficult time reasoning about the performance characteristics of apiece of code I have. The code is written in C on an PowerPC architecture (an old Apple G5). Running the code with O3 and some other optimization the code actually runs about 30% slower than with just O3. The difference between the assembly code boils down to a couple of instructions (say 3-4) and their arrangement.

My problem is due to my inexperience I am having difficulty in understanding why the assembly output perform worse in on case and better in the other. Tools such as oprofile are not really helpful here and looking at the official IBM instruction documentation does not give any insight (at least of what I have seen so far at least) on the performance characteristics of a perticular instruction. How does one approach these kind of analysis problems? As mentioned, I have little experience with assembly and pipeline analysis and thus I would appreciate any suggestions on how one usually approach these kind of problems. Are there any tools out there that can aid me?

Also, I am not really interested in why the compiler generated the code the way it did (and in a sense I am not really interested in how the original C code works), I'm really only interested in understanding the assembly performance analysis.

Update

I just want to give a brief update on the problem - by using a PowerPC pipeline simulator by IBM it was possible to see exactly what happened in the pipeline and thus it became much easier to understand the problem (it turned out to be a problem related to issue queues being full and formation of dispatch groups). I suggest to anyone looking at similar problems to use a pipeline simulator, it will help a lot in understanding the performance of your program! Due to the complexity of powerful machines, it seems very difficult to analyze the performance characteristics of a program without the use of pipeline simulator. This probably means that in order to truly understand how your program impacts performance it is necessary to understand the architecture the code is being run on.

Upvotes: 2

Views: 245

Answers (1)

[...] As mentioned, I have little experience with assembly and pipeline analysis and thus I would appreciate any suggestions on how one usually approach these kind of problems. [...]

I'd suggest the following material and usage examples. Although they're based on IBM POWER7(+), the ideas and explanations there might give you some context:

"Commonly Used Metrics for Performance Analysis – POWER7" [0] 
    (First, this paper briefly covers the POWER7 execution pipeline and the PMU hardware. ...)

 "Comprehensive PMU Event Reference – POWER7" [0]  (Performance Monitor
 Unit instrumentation. These events can be measured using tools
 like...)

"Evaluate performance for Linux on POWER" (developerWorks) (Learn to evaluate Linux on POWER® performance issues that focus on compiled language (such as C or C++) environments...)

"Java performance improvements seen on POWER7+" (PowerLinux Community)
    (The processors feature a built-in Performance Monitoring Unit (PMU), designed to provide instrumentation for performance monitoring, workload characterization, and code analysis....)

Source

Upvotes: 1

Related Questions