Fureeish
Fureeish

Reputation: 13424

Why is '\n' preferred over "\n" for output streams?

In this answer we can read that:

I suppose there's little difference between using '\n' or using "\n", but the latter is an array of (two) characters, which has to be printed character by character, for which a loop has to be set up, which is more complex than outputting a single character.

emphasis mine

That makes sense to me. I would think that outputting a const char* requires a loop which will test for null-terminator, which must introduce more operations than, let's say, a simple putchar (not implying that std::cout with char delegates to calling that - it's just a simplification to introduce an example).

That convinced me to use

std::cout << '\n';
std::cout << ' ';

rather than

std::cout << "\n";
std::cout << " ";

It's worth to mention here that I am aware of the performance difference being pretty much negligible. Nonetheless, some may argue that the former approach carries intent of actually passing a single character, rather than a string literal that just happened to be a one char long (two chars long if you count the '\0').

Lately I've done some little code reviews for someone who was using the latter approach. I made a small comment on the case and moved on. The developer then thanked me and said that he hadn't even thought of such difference (mainly focusing on the intent). It was not impactful at all (unsurprisingly), but the change was adopted.

I then began wondering how exactly is that change significant, so I ran to godbolt. To my surprise, it showed the following results when tested on GCC (trunk) with -std=c++17 -O3 flags. The generated assembly for the following code:

#include <iostream>

void str() {
    std::cout << "\n";
}

void chr() {
    std::cout << '\n';
}

int main() {
    str();
    chr();
}

surprised me, because it appears that chr() is actually generating exactly twice as many instructions as str() does:

.LC0:
        .string "\n"
str():
        mov     edx, 1
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:_ZSt4cout
        jmp     std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
chr():
        sub     rsp, 24
        mov     edx, 1
        mov     edi, OFFSET FLAT:_ZSt4cout
        lea     rsi, [rsp+15]
        mov     BYTE PTR [rsp+15], 10
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        add     rsp, 24
        ret

Why is that? Why both of them eventually call the same std::basic_ostream function with const char* argument? Does it mean that the char literal approach is not only not better, but actually worse than string literal one?

Upvotes: 31

Views: 2167

Answers (3)

catnip
catnip

Reputation: 25388

None of the other answers really explain why the compiler generates the code it does in your Godbolt link, so I thought I'd chip in.

If you look at the generated code, you can see that:

std::cout << '\n';

Compiles down to, in effect:

const char c = '\n';
std::cout.operator<< (&c, 1);

and to make this work, the compiler has to generate a stack frame for function chr(), which is where many of the extra instructions come from.

On the other hand, when compiling this:

std::cout << "\n";

the compiler can optimise str() to simply 'tail call' operator<< (const char *), which means that no stack frame is needed.

So your results are somewhat skewed by the fact that you put the calls to operator<< in separate functions. It's more revealing to make these calls inline, see: https://godbolt.org/z/OO-8dS

Now you can see that, while outputting '\n' is still a little more expensive (because there is no specific overload for ofstream::operator<< (char)), the difference is less marked than in your example.

Upvotes: 33

Michael Mahn
Michael Mahn

Reputation: 797

Keep in mind though that what you see in the assembly is only the creation of the callstack, not the execution of the actual function.

std::cout << '\n'; is still much slightly faster than std::cout << "\n";

I've created this little program to measure the performance and it's about 20 times slightly faster on my machine with g++ -O3. Try it yourself!

Edit: Sorry noticed typo in my program and it's not that much faster! Can barely measure any difference anymore. Sometimes one is faster. Other times the other.

#include <chrono>
#include <iostream>

class timer {
    private:
        decltype(std::chrono::high_resolution_clock::now()) begin, end;

    public:
        void
        start() {
            begin = std::chrono::high_resolution_clock::now();
        }

        void
        stop() {
            end = std::chrono::high_resolution_clock::now();
        }

        template<typename T>
        auto
        duration() const {
            return std::chrono::duration_cast<T>(end - begin).count();
        }

        auto
        nanoseconds() const {
            return duration<std::chrono::nanoseconds>();
        }

        void
        printNS() const {
            std::cout << "Nanoseconds: " << nanoseconds() << std::endl;
        }
};

int
main(int argc, char** argv) {
    timer t1;
    t1.start();
    for (int i{0}; 10000 > i; ++i) {
        std::cout << '\n';
    }
    t1.stop();

    timer t2;
    t2.start();
    for (int i{0}; 10000 > i; ++i) {
        std::cout << "\n";
    }
    t2.stop();
    t1.printNS();
    t2.printNS();
}

Edit: As geza suggested I tried 100000000 iterations for both and sent it to /dev/null and ran it four times. '\n' was once slower and 3 times faster but never by much, but it might be different on other machines:

Nanoseconds: 8668263707
Nanoseconds: 7236055911

Nanoseconds: 10704225268
Nanoseconds: 10735594417

Nanoseconds: 10670389416
Nanoseconds: 10658991348

Nanoseconds: 7199981327
Nanoseconds: 6753044774

I guess overall I wouldn't care too much.

Upvotes: 7

geza
geza

Reputation: 29942

Yes, for this particular implementation, for your example, char version is a little bit slower than the string version.

Both versions call a write(buffer, bufferSize) style function. For the string version, bufferSize is known at compile time (1 byte), so there is no need to find the zero terminator run-time. For the char version, the compiler creates a little 1-byte buffer on stack, puts the character into it, and passes this buffer to write out. So, the char version is a little bit slower.

Upvotes: 5

Related Questions