Reputation: 1212
Is it possible force compiler to static-interpret a virtual function in derived class when it is indirectly called, to avoid vtable-cost? Why?
I create a test to study impact of final
-keyword on vtable cost.
B
derived from class A
.A::f1()
is a non virtual function.A::f2()
is a virtual function. B
overrides it.A::f3()
is a virtual function. B
overrides it and mark it as final.A::f4()
is a non virtual function. It call A::f3()
.I profile and notice that the cost of functions are (relatively) :-
B*->f1()
= 160B*->f2()
= 270 : The "virtual" cost a lot.B*->f3()
= 160 : The "final" yield performance gain!B*->f4()
= 270 : Why not 160? <-- questionCompiler seems to look at B::f4()
and try to call A::f3()
, look at vtable, and then call B::f3()
.
I believe compiler should statically know that B*->f4()
will call B*->f3()
, so there should be no v-table cost.
f4()
among every class that derived from A
. Thus, it is to prevent "code-bloat", correct? f4
to be in A
, and not appear in B
.Here is the test.
class A{
public: int f1(){return randomNumber*3;};
public: virtual int f2(){return randomNumber*3;};
public: virtual int f3(){return randomNumber*3;};
public: int f4(){return f3();};
public: int randomNumber=((double) rand() / (RAND_MAX))*10;
};
class B : public A {
public: virtual int f2() {return randomNumber*4;};
public: virtual int f3()final {return randomNumber*4;};
};
int main(){
std::vector<B*> bs;
const int numTest=10000;
for(int n=0;n<numTest;n++){
bs.push_back(new B());
};
int accu=0;
for(int n=0;n<numTest;n++){
accu+=bs[n]->f1(); //warm
};
auto t1= std::chrono::system_clock::now();
for(int n=0;n<numTest;n++){
accu+=bs[n]->f1(); //test 1 : base case, non virtual
};
auto t2= std::chrono::system_clock::now();
for(int n=0;n<numTest;n++){
accu+=bs[n]->f2(); //test 2: virtual
};
auto t3= std::chrono::system_clock::now();
for(int n=0;n<numTest;n++){
accu+=bs[n]->f3(); //test 3: virtual & final
};
auto t4= std::chrono::system_clock::now();
for(int n=0;n<numTest;n++){
accu+=bs[n]->f4(); //test 4: virtual & final & encapsulator
};
auto t5= std::chrono::system_clock::now();
auto t21=t2-t1;
auto t32=t3-t2;
auto t43=t4-t3;
auto t54=t5-t4;
std::cout<<"test1 base ="<<t21.count()<<std::endl;
std::cout<<"test2 virtual ="<<t32.count()<<std::endl;
std::cout<<"test3 virtual & final ="<<t43.count()<<std::endl;
std::cout<<"test4 virtual & final & indirect="<<t54.count()<<std::endl;
std::cout<<"forbid optimize"<<accu;
}
Sorry if I use wrong jargons, I am very new to C++.
This question come from curiosity.
In practice, it can be solved by moving f4()
to B
, but I want to know the rationale behind it.
Upvotes: 1
Views: 100
Reputation: 3707
The problem is there is no B::f4()
in your example. So the only f4
is A::f4()
. And that one must work with all derived classes from A.
As you noticed, you could write your own B::f4()
that would then be overloaded (not overridden). The compiler would then call B::f4()
when it knows you are accessing a B. In B::f4()
the compiler should be smart enough to directly use B::f3()
.
If you access a B through an A reference or pointer, the compiler would continue to use A::f4()
.
When I tried this on the compiler explorer which only has the 2017 compiler B::f3
was inlined in B::f4
and both into a calling function as expected.
When I did not define B::f4
, A::f4
was inlined and still performed the virtual function call.
Your compiler seems to be unable to reason well about virtual function calls after inlining f4. I can only speculate how the Microsoft compiler works in detail, but gcc and LLVM compile to a language agnostic intermediate form (GIMPLE format and LLVM IR respectively) and perform optimizations on that. Afterwards this becomes an aliasing problem where the compiler must statically prove that the entry in the virtual table is always B::f3
. Usually it cannot be sure and unfortunately the information about final methods seems not to be propagated far enough. GCC at least does speculative devirtualization if it seems profitable.
When no inlining happens, I think the compiler would have a very hard time to optimize this even if it saw all the definitions at once which is not guaranteed.
Providing an additional "specialization" of A::f4
for objects of type B would in theory be feasible, but I am not sure it gives enough average case performance to be considered worthwhile by compiler developers.
One way to implement f4 such that the compiler generates the code variants you want without you having to repeat yourself would be as a template function external to A:
template <typename DerivedFromA>
inline int f4(DerivedFromA &x)
{
return x.f3();
}
Upvotes: 1