Reputation:
Suppose the code blow is my class. It's simplified and not complete. Let's focus on the implementation of operator()
.
class Delta{
public:
long long operator()() {
auto now = steady_clock::now();
auto delta = (now - last).count();
last = now;
return delta;
}
private:
steady_clock::time_point last;
};
operator()
may be called thousands of times per second. I just wonder frequently allocate and deallocate variable now
and delta
may hurt the performance of operator()
. So is it better to make now
and delta
data member of class Delta
if I want to maximize the speed? But I also heard that local variable may not even exists when compiled. So somehow the overhead does not exists either.
well, actually the speed of this operator doesn't make any difference to my Application's speed. I just wan't to know a compiler-neutral answer. When this situation comes. Should I make it data members or local variables?
Upvotes: 1
Views: 1695
Reputation: 2862
I don't disagree with any other answer, but let me try to explain this in simple terms without machine code. I am going to ignore some real life details that are important, but don't teach the concepts you are asking about.
Lets say you have a function with these variables
int a;
int b;
int c;
int d;
During compile time, the compiler adds up the sizes of all the local variables and when the function is called the run-time code allocates enough stack space for all the variables. So if sizeof(int) is 4, then the above variables need 16 bytes of stack space. Most compilers use a machine register to hold the stack pointer (sp) so when our function is called the run-time code will do something like
sp = sp + 16
to reserve space for our 4 variables. Note that the run-time code to allocate local variables takes the same time if the function has 1 or 1000 local variables. There is no cost per variable (unless they have ctors to call). If we have a C statement like
d = b;
the pseudo machine code would look like
*(sp+12) = *(sp+4)
where 12 is the offset of variable d on the stack and 4 is the offset of b. (The offsets will not be this simple, there is other stuff allocated on the stack.)
When you defined a struct / class with member variables like
class X {
int a;
int b;
int c;
int d;
void foo() { d = b; }
};
the compiler also adds up the sizes of all the variables and assignes offsets to each one. But now the code inside foo() becomes
*(this+12) = *(this + 4)
While sp is almost always kept in a machine register, the 'this' pointer is only highly likely to be in a machine register. Modern compilers look at what variables are used the most and store those variables in registers. Since 'this' is usually referenced a lot (often implicitly) it usually gets assigned to a register. When 'this' is in a register, the performance should be the same.
Upvotes: 0
Reputation:
Optimization generally depends on a compiler. But assuming that you are using somewhat decent compiler, there will be no performance penalty, so don't worry about it. To prove it, I have compiled your code with gcc 4.7, optimization level 3:
call 400770 <std::chrono::system_clock::now()@plt> ;; Call.
mov rdx,rax ;; Remembe temporary value in %rdx.
sub rax,QWORD PTR [rbx] ;; Divide
mov QWORD PTR [rbx],rdx ;; Wrie Back.
Depending on context, it may get optimized further. Or it may get worse. Just to give you an example of when a temporary variable can be created on stack — you put a lot of code in between of now
and last
and register allocation algorithm cannot place all of the variables in registers, it will resort to using stack. So for actual results you have to check generated machine code. But frankly, there is not a lot to optimize here, except one obvious thing. What you have to worry about if you care about performance that much is a lot of calls through PLT. In other words — don't use std::chrono::system_clock::now()
.
Upvotes: 1
Reputation: 490058
On x86-64, I'd expect this code to end up with both now
and delta
allocated in RAX. In assembly language, the code would look something on this order:
assume RSI:ptr _Delta
call steady_clock::now()
sub rax, [rsi].last
mov [rsi].last, rax
ret
Of course, in real assembly language, you'd see the mangled names for steady_clock::now()
(for one example), but you get the general idea. Upon entry to any non-static member function, it's going to have this
in some register. The return value always goes in rax
. I don't see any particularly good reason a compiler would need (or even want) to allocate space for any other variables.
On 32-bit x86, there's a much higher likelihood that this would end up using some stack space, though it's possible that it would return a 64-bit value in EDX:EAX, in which case things would end up fairly similar to what's above, just using one more register.
Most other processors start out with more registers than an x86, so the register pressure is lower. On a SPARC, for example, a routine will normally start with 8 local registers free and ready for use, so allocating now
in a register would be a near certainty.
Bottom line: you're unlikely to see a significant speed difference, but if you do see a difference, I'd guess it's more likely to favor using a local variable than a member variable.
Upvotes: 3
Reputation: 59997
It will not make much (if any) difference. The OS allocates memory (including the stack) in terms of pages. Therefore the stack will probably not complete a page and therefore the process will not require a context switch to gain another page.
As to compiler neutral answer the speed will boil down to context switching, other things running on the processor, ....
Besides some people like yourself seem to focus on the micro performance improvements but avoid the bigger picture. It is best to find out first where the bottle necks are first and concentrate on those. Remember the 80/20 rule.
Upvotes: 1
Reputation: 767
Of coarse there is a performance issue in your code.
Whenever operator() will be gets called every time on stack there will be two variable will be created and destroyed(actually it will happen).
The performance in short term you will not notice because system will always reserve some memory for Stack and every time same memory is going to be accessed.
But in the long run (performance run) you will be able to see the difference.
Upvotes: 0