Reputation: 6855
I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:
I) Intel e6750
II) Phenom II x4 2.8Ghz
Anyone can help me why this is happening? I'm really confused about the results.
In both cases I'm compiling with g++ using -O3 as flag.
PS: The algorithm doesn't use floating point math, it uses the SSE's integer instructions.
Upvotes: 6
Views: 1465
Reputation: 213170
Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.
Upvotes: 4