Daniel McLaury
Daniel McLaury

Reputation: 4303

Undefined behavior allowed in constexpr -- compiler bug?

My understanding is that:

It seems to follow that something like the following should not compile, and indeed on my compiler it doesn't.

template<int n> struct S { };

template<int a, int b>
S<a * b> f()
{
  return S<a * b>();
}

int main(int, char **)
{
  f<50000, 49999>();
  return 0;
}

However, now I try the following instead:

#include <numeric>

template<int n> struct S { };

template<int a, int b>
S<std::lcm(a, b)> g()
{
  return S<std::lcm(a,b)>();
}

int main(int, char **)
{
  g<50000, 49999>();
  return 0;
}

Each of g++, clang, and MSVC will happily compile this, despite the fact that

The behavior is undefined if |m|, |n|, or the least common multiple of |m| and |n| is not representable as a value of type std::common_type_t<M, N>.

(Source: https://en.cppreference.com/w/cpp/numeric/lcm)

Is this a bug in all three compilers? Or is cppreference wrong about lcm's behavior being undefined if it can't represent the result?

Upvotes: 8

Views: 433

Answers (3)

doctorlove
doctorlove

Reputation: 19317

A late reply, but it was a bug in gcc and is now fixed: See https://gcc.gnu.org/PR105844

Quoting some details:

    When I fixed PR libstdc++/92978 I introduced a regression whereby
    std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce
    errors during constant evaluation. Those calls are undefined, because
    they violate the preconditions that |m| and the result can be
    represented in the return type (which is int in both those cases). The
    regression occurred because __absu<unsigned>(INT_MIN) is well-formed,
    due to the explicit casts to unsigned in that new helper function, and
    the out-of-range multiplication is well-formed, because unsigned
    arithmetic wraps instead of overflowing.

Upvotes: 2

Brian Bi
Brian Bi

Reputation: 119587

According to [expr.const]/5, "an operation that would have undefined behavior as specified in [intro] through [cpp]" is not permitted during constant evaluation, but:

If E satisfies the constraints of a core constant expression, but evaluation of E would evaluate an operation that has undefined behavior as specified in [library] through [thread], or an invocation of the va_­start macro ([cstdarg.syn]), it is unspecified whether E is a core constant expression.

We usually summarize this as "language UB must be diagnosed in a context that requires a constant expression, but library UB does not necessarily need to be diagnosed".

The reason for this rule is that an operation that causes library UB may or may not cause language UB, and it would be difficult for compilers to consistently diagnose library UB even in cases when it doesn't cause language UB. (In fact, even some forms of language UB are not consistently diagnosed by current implementations.)

Some people also refer to language UB as "hard" UB and library UB as "soft" UB, but I don't like this terminology because (in my opinion) it encourages users to think of "code for which it's unspecified whether language UB occurs" as somehow less bad than "code that unambiguously has language UB". But in both cases, the result is that the programmer cannot write a program that executes such code and expect anything to work properly.

Upvotes: 15

Goswin von Brederlow
Goswin von Brederlow

Reputation: 12362

The problem is that std::lcm() is doing computations using unsigned of whatever type the arguments are. It uses using _Up = make_unsigned_t<common_type_t<_Mn, _Nn>>; in my STL and converts all arguments to _Up first. 50000 * 49999 = 2499950000 < 4294967296 = 2^32 does not cause an overflow and unsigned overflow would not be UB in any case.

But if you have template code for gcd and lcm like this without changing types: https://godbolt.org/z/zoxzsr45x

// GCD implementation
template<typename T, T m, T n>
constexpr T
gcd()
{
    if constexpr (m == 0) {
        return n;
    } else if constexpr (n == 0) {
        return m;
    } else {
        return gcd<T, n, T(m % n)>();
    }
}

// LCM implementation
template<typename T, T m, T n>
constexpr T
lcm()
{
    if constexpr (m != 0 && n != 0) {
        return (m / gcd<T, m, n>()) * n;
    } else {
        return 0;
    }
}

constinit auto t = lcm<int, 50000, 49999>();

int main(int, char **)
{
  return 0;
}

Then the compiler fails with:

<source>: In instantiation of 'constexpr T lcm() [with T = int; T m = 50000; T n = 49999]':
<source>:27:42:   required from here
<source>:21:37: warning: integer overflow in expression of type 'int' results in '-1795017296' [-Woverflow]
   21 |         return (m / gcd<T, m, n>()) * n;
      |                ~~~~~~~~~~~~~~~~~~~~~^~~
<source>:27:16: error: 'constinit' variable 't' does not have a constant initializer
   27 | constinit auto t = lcm<int, 50000, 49999>();
      |                ^
<source>:27:42:   in 'constexpr' expansion of 'lcm<int, 50000, 49999>()'
<source>:27:43: error: overflow in constant expression [-fpermissive]
   27 | constinit auto t = lcm<int, 50000, 49999>();
      |                                           ^

In gcc-10 under Debian std::lcm is defined as:

  // std::abs is not constexpr, doesn't support unsigned integers,
  // and std::abs(std::numeric_limits<T>::min()) is undefined.
  template<typename _Up, typename _Tp>
    constexpr _Up
    __absu(_Tp __val)
    {
      static_assert(is_unsigned<_Up>::value, "result type must be unsigned");
      static_assert(sizeof(_Up) >= sizeof(_Tp),
          "result type must be at least as wide as the input type");
      return __val < 0 ? -(_Up)__val : (_Up)__val;
    }
  /// Least common multiple
  template<typename _Mn, typename _Nn>
    constexpr common_type_t<_Mn, _Nn>
    lcm(_Mn __m, _Nn __n) noexcept
    {
      static_assert(is_integral_v<_Mn>, "std::lcm arguments must be integers");
      static_assert(is_integral_v<_Nn>, "std::lcm arguments must be integers");
      static_assert(_Mn(2) == 2, "std::lcm arguments must not be bool");
      static_assert(_Nn(2) == 2, "std::lcm arguments must not be bool");
      using _Up = make_unsigned_t<common_type_t<_Mn, _Nn>>;
      return __detail::__lcm(__detail::__absu<_Up>(__m),
                             __detail::__absu<_Up>(__n));
    }

The cast to _Up and return type of __absu causes the UB to go away.

Upvotes: 2

Related Questions