Reputation: 21589
We know that modern processors execute instructions such as cosine
and sin
directly on the processor as they have opcodes for it. My question is how much cycles these instructions normally take. Do they take constant time or depend upon input parameters?
Upvotes: 5
Views: 7457
Reputation: 941208
Talking about "cycles for an instruction" for modern processors got to be difficult quite a while ago. Processors these days contain multiple execution cores, their operation can overlap and can execute out-of-order.
A good example of the essential consideration is given in the Intel processor manual, volume 4, appendix C. It breaks down instruction timing by Latency and Throughput. Latency is the number of cycles an execution core requires to complete a micro-op. Throughput is the number of cycles required to have the execution unit accept the same instruction again. Throughput is generally lower than Latency, including having fractional values in the table. A side-effect of having more than one execution unit of the same type. The type is important, that tells you whether instructions can overlap.
Maybe you got the essential message here: it greatly depends what other instructions surround the code you are interested in timing. Those other instructions may well execute concurrently with the expensive one. At which point they take, effectively, 0 cycles. Or they may not, stalling the pipeline because the execution unit is busy with a previous instruction. The kind of details that programmers that write code optimizers care a lot about.
Some sample data from the manual, picking the most modern core in the tables:
A much better bang on SIMD instructions.
The only meaningful thing to do is measure, not assume.
Upvotes: 9
Reputation: 222244
The times vary depending on the processor model. Times typically range from tens of CPU cycles to a hundred or more.
(The times consumed by many instructions vary depending on circumstances, because instructions use a variety of resources in the processor [dispatcher, execution units, rename registers, and more], so how long an instruction delays other work depends on what else is going on in the processor. For example, if some code is doing almost entirely load and store instructions, then a very occasional sine instruction might not slow its execution at all. However, instructions that take tens of CPU cycles are usually dominated by their times in the execution unit, which is the part that does the actual numerical calculation.)
The execution times may vary depending on input parameters. Large arguments to trigonometric functions must be reduced modulo 2π, which is a complicated problem by itself.
In the Mac OS X math library, we generally write our own implementations, often in assembly language, for various reasons that may include speed, conformance to standards, suitability for the application binary interface, and other features.
If you are just curious, then “tens to hundreds of processor cycles” may be a good enough answer, especially without specifying a particular processor model. Essentially, the time is long enough that you should not use these operations without good reason. (E.g., I have seen code that obtains π as 4·arctan(1). Do not do that.)
If you have other reasons for asking, you should explain, so that answers can be focused.
Upvotes: 15