Why is snprintf consistently 2x faster than ostringstream for printing a single number?

Question

I was testing various approaches at formatting doubles in C++, and here's some code I came up with:

#include 
#include 
#include 
#include 
#include 
#include 

inline long double currentTime()
{
    const auto now = std::chrono::steady_clock::now().time_since_epoch();
    return std::chrono::duration(now).count();
}

int main()
{
    std::mt19937 mt(std::random_device{}());
    std::normal_distribution dist(0, 1e280);
    static const auto rng=[&](){return dist(mt);};
    std::vector numbers;
    for(int i=0;i<10000;++i)
        numbers.emplace_back(rng());

    const int precMax=200;
    const int precStep=10;

    char buf[10000];
    std::cout << "snprintf
";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        const auto t0=currentTime();
        for(const auto num : numbers)
            std::snprintf(buf, sizeof buf, "%.*e", precision, num);
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s
";
    }

    std::cout << "ostringstream
";
    for(int precision=10;precision<=precMax;precision+=precStep)
    {
        std::ostringstream ss;
        ss.precision(precision);
        ss << std::scientific;
        const auto t0=currentTime();
        for(const auto num : numbers)
        {
            ss.str("");
            ss << num;
        }
        const auto t1=currentTime();
        std::cout << "Precision " << precision << ": " << t1-t0 << " s
";
    }
}

What makes me wonder is that at first, when precision is less than 40, I get more or less the same performance. But then the difference goes to 2.1x in favor of snprintf. See my output on Core i7-4765T, Linux 32-bit, g++ 5.5.0, libc 2.14.1, compiled with -march=native -O3:

snprintf
Precision 10: 0.0262963 s
Precision 20: 0.035437 s
Precision 30: 0.0468597 s
Precision 40: 0.0584917 s
Precision 50: 0.0699653 s
Precision 60: 0.081446 s
Precision 70: 0.0925062 s
Precision 80: 0.104068 s
Precision 90: 0.115419 s
Precision 100: 0.128886 s
Precision 110: 0.138073 s
Precision 120: 0.149591 s
Precision 130: 0.161005 s
Precision 140: 0.17254 s
Precision 150: 0.184622 s
Precision 160: 0.195268 s
Precision 170: 0.206673 s
Precision 180: 0.218756 s
Precision 190: 0.230428 s
Precision 200: 0.241654 s
ostringstream
Precision 10: 0.0269695 s
Precision 20: 0.0383902 s
Precision 30: 0.0497328 s
Precision 40: 0.12028 s
Precision 50: 0.143746 s
Precision 60: 0.167633 s
Precision 70: 0.190878 s
Precision 80: 0.214735 s
Precision 90: 0.238105 s
Precision 100: 0.261641 s
Precision 110: 0.285149 s
Precision 120: 0.309025 s
Precision 130: 0.332283 s
Precision 140: 0.355797 s
Precision 150: 0.379415 s
Precision 160: 0.403452 s
Precision 170: 0.427337 s
Precision 180: 0.450668 s
Precision 190: 0.474012 s
Precision 200: 0.498061 s

So my main question is: what is the reason for this twofold difference? And additionally, how can I make ostringstream's performance closer to that of snprintf?

NOTE: another question, Why is snprintf faster than ostringstream or is it?, is different from mine. First, there's no specific answer there, why formatting of a single number in different precisions is slower. Second, that question asks "why it's slower in general", which is too broad to be useful to answer my question, while this one asks about one specific scenario of formatting single double number.

Ruslan · Accepted Answer

std::ostringstream calls vsnprintf twice: first time to try with a small buffer, and the second one with the correctly-sized buffer. See locale_facets.tcc around line 1011 (here std::__convert_from_v is a proxy for vsnprintf):

#if _GLIBCXX_USE_C99_STDIO
    // Precision is always used except for hexfloat format.
    const bool __use_prec =
      (__io.flags() & ios_base::floatfield) != ios_base::floatfield;

    // First try a buffer perhaps big enough (most probably sufficient
    // for non-ios_base::fixed outputs)
    int __cs_size = __max_digits * 3;
    char* __cs = static_cast(__builtin_alloca(__cs_size));
    if (__use_prec)
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __prec, __v);
    else
      __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                    __fbuf, __v);

    // If the buffer was not large enough, try again with the correct size.
    if (__len >= __cs_size)
      {
        __cs_size = __len + 1;
        __cs = static_cast(__builtin_alloca(__cs_size));
        if (__use_prec)
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __prec, __v);
        else
          __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
                        __fbuf, __v);
      }

This exactly matches the observation that for small requested precision performance is the same as that of snprintf, while for larger precision it's 2x poorer.

Moreover, since the buffer used doesn't depend on any properties of std::ostringstream buffer, only on __max_digits, which is defined as __gnu_cxx::__numeric_traits<_ValueT>::__digits10, there doesn't seem to be any natural fix for this other than fixing libstdc++ itself.

I've reported it as bug to libstdc++.

Why is snprintf consistently 2x faster than ostringstream for printing a single number?

Answers (1)

Related Questions