Santiago Alessandri
Santiago Alessandri

Reputation: 6855

SSE program takes a lot longer on AMD than on Intel

I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:

I) Intel e6750

  1. Doing 4 times the non-SSE2 algorithm takes 14.85 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 6.89 seconds

II) Phenom II x4 2.8Ghz

  1. Doing 4 times the non-SSE2 algorithm takes 11.43 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 12.15 seconds

Anyone can help me why this is happening? I'm really confused about the results.

In both cases I'm compiling with g++ using -O3 as flag.

PS: The algorithm doesn't use floating point math, it uses the SSE's integer instructions.

Upvotes: 6

Views: 1465

Answers (1)

Paul R
Paul R

Reputation: 213170

Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.

Upvotes: 4

Related Questions