Reputation: 13424
In this answer we can read that:
I suppose there's little difference between using
'\n'
or using"\n"
, but the latter is an array of (two) characters, which has to be printed character by character, for which a loop has to be set up, which is more complex than outputting a single character.
emphasis mine
That makes sense to me. I would think that outputting a const char*
requires a loop which will test for null-terminator, which must introduce more operations than, let's say, a simple putchar
(not implying that std::cout
with char
delegates to calling that - it's just a simplification to introduce an example).
That convinced me to use
std::cout << '\n';
std::cout << ' ';
rather than
std::cout << "\n";
std::cout << " ";
It's worth to mention here that I am aware of the performance difference being pretty much negligible. Nonetheless, some may argue that the former approach carries intent of actually passing a single character, rather than a string literal that just happened to be a one char
long (two char
s long if you count the '\0'
).
Lately I've done some little code reviews for someone who was using the latter approach. I made a small comment on the case and moved on. The developer then thanked me and said that he hadn't even thought of such difference (mainly focusing on the intent). It was not impactful at all (unsurprisingly), but the change was adopted.
I then began wondering how exactly is that change significant, so I ran to godbolt. To my surprise, it showed the following results when tested on GCC (trunk) with -std=c++17 -O3
flags. The generated assembly for the following code:
#include <iostream>
void str() {
std::cout << "\n";
}
void chr() {
std::cout << '\n';
}
int main() {
str();
chr();
}
surprised me, because it appears that chr()
is actually generating exactly twice as many instructions as str()
does:
.LC0:
.string "\n"
str():
mov edx, 1
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
jmp std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
chr():
sub rsp, 24
mov edx, 1
mov edi, OFFSET FLAT:_ZSt4cout
lea rsi, [rsp+15]
mov BYTE PTR [rsp+15], 10
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
add rsp, 24
ret
Why is that? Why both of them eventually call the same std::basic_ostream
function with const char*
argument? Does it mean that the char
literal approach is not only not better, but actually worse than string literal one?
Upvotes: 31
Views: 2167
Reputation: 25388
None of the other answers really explain why the compiler generates the code it does in your Godbolt link, so I thought I'd chip in.
If you look at the generated code, you can see that:
std::cout << '\n';
Compiles down to, in effect:
const char c = '\n';
std::cout.operator<< (&c, 1);
and to make this work, the compiler has to generate a stack frame for function chr()
, which is where many of the extra instructions come from.
On the other hand, when compiling this:
std::cout << "\n";
the compiler can optimise str()
to simply 'tail call' operator<< (const char *)
, which means that no stack frame is needed.
So your results are somewhat skewed by the fact that you put the calls to operator<<
in separate functions. It's more revealing to make these calls inline, see: https://godbolt.org/z/OO-8dS
Now you can see that, while outputting '\n'
is still a little more expensive (because there is no specific overload for ofstream::operator<< (char)
), the difference is less marked than in your example.
Upvotes: 33
Reputation: 797
Keep in mind though that what you see in the assembly is only the creation of the callstack, not the execution of the actual function.
std::cout << '\n';
is still much slightly faster than std::cout << "\n";
I've created this little program to measure the performance and it's about 20 times slightly faster on my machine with g++ -O3. Try it yourself!
Edit: Sorry noticed typo in my program and it's not that much faster! Can barely measure any difference anymore. Sometimes one is faster. Other times the other.
#include <chrono>
#include <iostream>
class timer {
private:
decltype(std::chrono::high_resolution_clock::now()) begin, end;
public:
void
start() {
begin = std::chrono::high_resolution_clock::now();
}
void
stop() {
end = std::chrono::high_resolution_clock::now();
}
template<typename T>
auto
duration() const {
return std::chrono::duration_cast<T>(end - begin).count();
}
auto
nanoseconds() const {
return duration<std::chrono::nanoseconds>();
}
void
printNS() const {
std::cout << "Nanoseconds: " << nanoseconds() << std::endl;
}
};
int
main(int argc, char** argv) {
timer t1;
t1.start();
for (int i{0}; 10000 > i; ++i) {
std::cout << '\n';
}
t1.stop();
timer t2;
t2.start();
for (int i{0}; 10000 > i; ++i) {
std::cout << "\n";
}
t2.stop();
t1.printNS();
t2.printNS();
}
Edit: As geza suggested I tried 100000000 iterations for both and sent it to /dev/null and ran it four times. '\n' was once slower and 3 times faster but never by much, but it might be different on other machines:
Nanoseconds: 8668263707
Nanoseconds: 7236055911
Nanoseconds: 10704225268
Nanoseconds: 10735594417
Nanoseconds: 10670389416
Nanoseconds: 10658991348
Nanoseconds: 7199981327
Nanoseconds: 6753044774
I guess overall I wouldn't care too much.
Upvotes: 7
Reputation: 29942
Yes, for this particular implementation, for your example, char
version is a little bit slower than the string version.
Both versions call a write(buffer, bufferSize)
style function. For the string version, bufferSize
is known at compile time (1 byte), so there is no need to find the zero terminator run-time. For the char
version, the compiler creates a little 1-byte buffer on stack, puts the character into it, and passes this buffer to write out. So, the char
version is a little bit slower.
Upvotes: 5