Reputation: 1234
I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:
EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from std::string
to my::string
which does not have a swap()
, and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.
EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string
to std::vector<char>
and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.
EDIT3: Re-added the previous solution when compiled with -DCOW
. This makes the internal storage a std::string
rather than a std::vector<char>
as requested by @chico.
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <functional>
static std::size_t dec = 0;
namespace my { class string
{
public:
string( ) { }
#ifdef COW
string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) {
#else
string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) {
str.resize( ref.size( ) );
std::copy( ref.begin( ), ref.end( ), str.begin( ) );
#endif
}
bool operator<( const string& other ) const { return val < other.val; }
private:
#ifdef COW
std::string str;
#else
std::vector< char > str;
#endif
std::size_t val;
}; }
template< typename T >
void dup_vector( T& vec )
{
T v = vec;
for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i )
#ifdef CPP11
vec.push_back( std::move( *i ) );
#else
vec.push_back( *i );
#endif
}
int main( )
{
std::ifstream file;
file.open( "/etc/passwd" );
std::vector< my::string > lines;
while ( ! file.eof( ) )
{
std::string s;
std::getline( file, s );
lines.push_back( s + s + s + s + s + s + s + s + s );
}
while ( lines.size( ) < ( 1000 * 1000 ) )
dup_vector( lines );
std::cout << lines.size( ) << " elements" << std::endl;
std::sort( lines.begin( ), lines.end( ) );
return 0;
}
What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move()
you see in dup_vector()
, but also the push_back
per se should perform better when it needs to resize (create new + copy) the inner array.
Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.
I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):
1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
With the following results (twice for each compilation):
GCC C++98
1> real 0m9.626s
1> real 0m9.709s
GCC C++11
2> real 0m10.163s
2> real 0m10.130s
So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:
clang C++98
3> real 0m8.906s
3> real 0m8.750s
clang C++11
4> real 0m8.858s
4> real 0m9.053s
Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2
, all code runs faster, but the results between the different standards are almost the same as above.
EDIT: New results with my::string and rather than std::string, and larger individual strings:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real 0m16.637s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m17.169s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real 0m16.222s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m15.652s
There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.
EDIT2: Now without std::string
's COW, the performance improvement is huge:
$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real 0m10.313s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m5.267s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real 0m10.218s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m3.376s
With optimization, the difference is a lot bigger too:
$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out
real 0m5.243s
$ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m0.803s
$ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out
real 0m5.248s
$ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real 0m0.785s
Above showing a factor of ~6-7 times faster with C++11.
Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.
Upvotes: 21
Views: 4581
Reputation: 98348
I think that you'll need to profile the program. Maybe most of the time is spent in the lines T v = vec;
and the std::sort(..)
of a vector of 20 million strings!!! Nothing to do with move semantics.
Upvotes: 2
Reputation: 146910
This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.
std::string
has a swap
member, so sort
will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string
as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.
Upvotes: 14
Reputation: 51255
This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords.
When small string optimization is active for a particular string then move is done as a copy.
You will need to have larger strings to see any speed improvements with move semantics.
Upvotes: 2