Plamen
Plamen

Reputation: 660

Is there any difference performance difference between g++-mp-4.8 and g++-4.8?

I'm compiling the same program on two different machines and then running tests to compare performance.

There is a difference in the power of the two machines: one is MacBook Pro with a four 2.3GHz processors, the other is a Dell server with twelve 2.9 GHz processors.

However, the mac runs the test programs in shorter time!!

The only difference in the compilation is that I run g++-mp-4.8 on the machine mac, and g++-4.8 on the other.

EDIT: There is NO parallel computing going on, and my process was the only one run on the server. Also, I've updated the number of cores on the Dell.

EDIT 2: I ran three tests of increasing complexity, the times obtained were, in the format (Dell,Mac) in seconds: (1.67,0.56), (45,35), (120,103). These differences are quite substantial!

EDIT 3: Regarding the actual processor speed, we considered this with the system administrator and still came up with no good reason. Here is the spec for the MacBook processor:

http://ark.intel.com/fr/products/71459/intel-core-i7-3630qm-processor-6m-cache-up-to-3_40-ghz

and here for the server:

http://ark.intel.com/fr/products/64589/Intel-Xeon-Processor-E5-2667-15M-Cache-2_90-GHz-8_00-GTs-Intel-QPI

Upvotes: 1

Views: 201

Answers (2)

Chris
Chris

Reputation: 4231

The compilers are practically identical (-mp just signifies that this gcc version was installed via macports).

The performance difference you observed results from the different CPUs: The server is a "Sandy Bridge" microarchitecture, running at 3.5 GHz, while the MacBook has a newer "Ivy Bridge" CPU running at 3.4 GHz (single-thread turbo boost speeds).

Between Sandy Bridge and Ivy Bridge is just a "Tick" in Intel parlance, meaning that the process was changed (from 32nm to 22nm), but almost no changes to the microarchitecture. Still there are some changes in Ivy Bridge that improve the IPC (instructions per clock-cycle) for some workloads. In particular, the throughput of division operations, both integer and floating-point, was doubled. (For more changes, see the review on AnandTech: http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/2 )

As your workload contains lots of divisions, this fits your results quite nicely: the "small" testcase shows the largest improvement, while in the larger testcases, the improved core performance is probably shadowed by memory access, which seems roughly the same speed in both systems.

Note that this is purely educated guessing given the current information - one would need to look at your benchmark code, the compiler flags, and maybe analyze it using the CPU performance counters to verify this.

Upvotes: 1

Ben Voigt
Ben Voigt

Reputation: 283793

I would like to highlight a feature that particularly skews results of single-threaded code on mobile processors:

enter image description here

enter image description here

Note that while there's a 500 MHz difference in base speed (the question mentioned 2.3 GHz, are we looking at the same CPU?), there's only a 100 MHz difference in single-threaded speed, when Turbo Boost is running at maximum.

The Core-i7 also uses faster DDR than its server counterpart, which normally runs at a lower clock speed with more buffers to support much larger capacities of RAM. Normally the number of channels on the Xeon and difference in L3 cache size makes up for this, but different workloads will make use of cache and main memory differently.

Of course generational improvements can make a difference as well. The significance of Ivy Bridge vs Sandy Bridge varies greatly with application.

A final possibility is that the program runtime isn't CPU-bound. I/O subsystem, speed of GPGPU, etc can affect performance over multiple orders of magnitude for applications that exercise those.

Upvotes: 1

Related Questions