Why StringCopyFromLiteral is faster than StringCopyFromString?

Question

The Quick C++ Benchmarks example:

static void StringCopyFromLiteral(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::string from_literal("hello");
    // Make sure the variable is not optimized away by compiler
    benchmark::DoNotOptimize(from_literal);
  }
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromLiteral);

static void StringCopyFromString(benchmark::State& state) {
  // Code before the loop is not measured
  std::string x = "hello";
  for (auto _ : state) {
    std::string from_string(x);
  }
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromString);

http://quick-bench.com/IcZllt_14hTeMaB_sBZ0CQ8x2Ro

What if I understand assembly...

More results:

http://quick-bench.com/39fLTvRdpR5zdapKSj2ZzE3asCI

cdhowie · Accepted Answer

The answer is simple. In the case where you construct an std::string from a small string literal, the compiler optimizes this case by directly populating the contents of the string object using constants in assembly. This avoids expensive looping as well as tests to see whether small string optimization (SSO) can be applied. In this case it knows SSO can be applied so the code the compiler generates simply involves writing the string directly into the SSO buffer.

Note this assembly code in the StringCreation case:

// Populate SSO buffer (each set of 4 characters is backwards since
// x86 is little-endian)
19.63% movb   $0x6f,0x4(%r15)    // "o"
19.35% movl   $0x6c6c6568,(%r15) // "lleh"
// Set size
20.26% movq   $0x5,0x10(%rsp)    // size = 5
// Probably set heap pointer. 0 (nullptr) = use SSO buffer
20.07% movb   $0x0,0x1d(%rsp)

You're looking at the constant values right there. That's not very much code, and no loop is required. In fact, the std::string constructor doesn't even have to be invoked! The compiler is just putting stuff in memory in the same places where the std::string constructor would.

If the compiler cannot apply this optimization, the results are quite different -- in particular, if we "hide" the fact that the source is a string literal by first copying the literal into a char array, the results flip:

char x[] = "hello";
for (auto _ : state) {
  std::string created_string(x);
  benchmark::DoNotOptimize(created_string);
}

Now the "from-char-pointer" case takes twice as long! Why?

I suspect that this is because the "copy from char pointer" case cannot simply check to see how long the string is by looking at a value. It needs to know whether small string optimization can be performed. There's a few ways it could go about this:

Measure the length of the string first, make an allocation (if needed), then copy the source to the destination. In the case where SSO does apply (it almost certainly does here) I'd expect this to take twice as long since it has to walk the source twice -- once to measure, once to copy.
Copy from the source character-by-character, appending to the new string. This requires testing on each append operation whether the string is now too long for SSO and needs to be copied into a heap-allocated char array. If the string is currently in a heap-allocated array, it needs to instead test if the allocation needs to be resized. This would also take quite a bit longer since there is at least one test for each character in the source string.
Copy from the source in chunks to lower the number of tests that need to be performed and to avoid walking the source twice. This would be faster than the character-by-character approach both because the number of tests would be lower and, because the source is not being walked twice, the CPU memory cache is going to be more effective. This would only show significant speed improvements for long strings, which we don't have here. For short strings it would work about the same as the first approach (measure, then copy).

Contrast this to the case when it's copying from another string object: it can simply look at the size() of the other string and immediately know whether it can perform SSO, and if it can't perform SSO then it also knows exactly how much memory to allocate for the new string.

Why StringCopyFromLiteral is faster than StringCopyFromString?

Answers (1)

Related Questions