Reputation: 1602
I am amazed at how many topics on StackOverflow deal with finding out the endianess of the system and converting endianess. I am even more amazed that there are hundreds of different answers to these two questions. All proposed solutions that I have seen so far are based on undefined behaviour, non-standard compiler extensions or OS-specific header files. In my opinion, this question is only a duplicate if an existing answer gives a standard-compliant, efficient (e.g., use x86-bswap
), compile time-enabled solution.
Surely there must be a standard-compliant solution available that I am unable to find in the huge mess of old "hacky" ones. It is also somewhat strange that the standard library does not include such a function. Perhaps the attitude towards such issues is changing, since C++20 introduced a way to detect endianess into the standard (via std::endian
), and C++23 will probably include std::byteswap
, which flips endianess.
In any case, my questions are these:
Starting at what C++ standard is there a portable standard-compliant way of performing host to network byte order conversion?
I argue below that it's possible in C++20. Is my code correct and can it be improved?
Should such a pure-c++ solution be preferred to OS specific functions such as, e.g., POSIX-htonl
? (I think yes)
I think I can give a C++23 solution that is OS-independent, efficient (no system call, uses x86-bswap
) and portable to little-endian and big-endian systems (but not portable to mixed-endian systems):
// requires C++23. see https://gcc.godbolt.org/z/6or1sEvKn
#include <type_traits>
#include <utility>
#include <bit>
constexpr inline auto host_to_net(std::integral auto i) {
static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
if constexpr (std::endian::native == std::endian::big) {
return i;
} else {
return std::byteswap(i);
}
}
Since std::endian
is available in C++20, one can give a C++20 solution for host_to_net
by implementing byteswap
manually. A solution is described here, quote:
// requires C++17
#include <climits>
#include <cstdint>
#include <type_traits>
template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) {
return ((((i >> (N * CHAR_BIT)) & (T)(unsigned char)(-1)) <<
((sizeof(T) - 1 - N) * CHAR_BIT)) | ...);
}; // ^~~~~ fold expression
template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) {
return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}
The linked answer also provides a C++11 byteswap
, but that one seems to be less efficient (not compiled to x86-bswap
). I think there should be an efficient C++11 way of doing this, too (using either less template-nonsense or even more) but I don't care about older C++ and didn't really try.
Assuming I am correct, the remaining question is: can one can determine system endianess before C++20 at compile time in a standard-compliant and compiler-agnostic way? None of the answers here seem to do achieve this. They use reinterpret_cast
(not compile time), OS-headers, union aliasing (which I believe is UB in C++), etc. Also, for some reason, they try to do it "at runtime" although a compiled executable will always run under the same endianess.)
One could do it outside of constexpr context and hope it's optimized away. On the other hand, one could use system-defined preprocessor definitions and account for all platforms, as seems to be the approach taken by Boost. Or maybe (although I would guess the other way is better?) use macros and pick platform-specific htnl
-style functions from networking libraries(done, e.g., here (GitHub))?
Upvotes: 5
Views: 661
Reputation: 1602
I made a benchmark comparing my C++ solution from the question and the solution by eeroika from the accepted answer.
Looking at this is a complete waste of time, but now that I did it, I though I might as well share it. The result is that (in the specific not-quite-realistic usecase I look at) they seem to be equivalent in terms of performance. This is despite my solution being compiled to use x86-bswap
, while the solution by eeroika does it by just using mov
.
The performance seems to differ a lot (!!) when using different compilers and the main thing I learned from these benchmarks is, again, that I'm just wasting my time...
// benchmark to compare two C++20-stand-alone host-to-big-endian endianess conversion.]
// Run at quick-bench.com! This is not a complete program. (https://quick-bench.com/q/2qnr4xYKemKLZupsicVFV_09rEk)
// To run locally, include Google benchmark header and a main method as required by the benchmarking library.
// Adapted from https://stackoverflow.com/a/71004000/9988487
#include <type_traits>
#include <utility>
#include <cstddef>
#include <cstdint>
#include <climits>
#include <type_traits>
#include <utility>
#include <bit>
#include <random>
/////////////////////////////// Solution 1 ////////////////////////////////
template <typename T> struct scalar_t { T t{}; /* no begin/end */ };
static_assert(not std::ranges::range< scalar_t<int> >);
template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) noexcept {
constexpr auto bits_per_byte = 8u;
static_assert(bits_per_byte == CHAR_BIT);
return ((((i >> (N * bits_per_byte)) & (T)(unsigned char)(-1)) <<
((sizeof(T) - 1 - N) * bits_per_byte)) | ...);
}; // ^~~~~ fold expression
template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) noexcept {
return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}
constexpr inline auto host_to_net(std::integral auto i) {
static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
if constexpr (std::endian::native == std::endian::big) {
return i;
} else {
return bswap(i); // replace by `std::byteswap` once it's available!
}
}
/////////////////////////////// Solution 2 ////////////////////////////////
// helper to promote an integer type
template <class T>
using promote_t = std::decay_t<decltype(+std::declval<T>())>;
template <class T, std::size_t... I>
constexpr void
host_to_big_impl(
unsigned char* buf,
T t,
[[maybe_unused]] std::index_sequence<I...>) noexcept {
using U = std::make_unsigned_t<promote_t<T>>;
constexpr U lastI = sizeof(T) - 1u;
constexpr U bits = 8u;
U u = t;
( (buf[I] = u >> ((lastI - I) * bits)), ... );
}
template <class T, std::size_t... I>
constexpr void
host_to_big(unsigned char* buf, T t) noexcept {
using Indices = std::make_index_sequence<sizeof(T)>;
return host_to_big_impl<T>(buf, t, Indices{});
}
//////////////////////// Benchmarks ////////////////////////////////////
template<std::integral T>
std::vector<T> get_random_vector(std::size_t length, unsigned int seed) {
// NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. Don't use in production code!
std::mt19937_64 rng{seed};
std::uniform_int_distribution<T> distribution(
std::numeric_limits<T>::min(), std::numeric_limits<T>::max());
std::vector<T> result(length);
for (auto && val : result) {
val = distribution(rng);
}
return result;
}
template<>
std::vector<bool> get_random_vector<bool>(std::size_t length, unsigned int seed) {
// NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. ONLY USE FOR TESTING!
std::mt19937_64 rng{seed};
std::bernoulli_distribution distribution{0.5};
std::vector<bool> vec(length);
for (auto && val : vec) {
val = distribution(rng);
}
return vec;
}
constexpr std::size_t n_ints{1000};
static void solution1(benchmark::State& state) {
std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());
for (auto _ : state) {
for (std::size_t i{}; i < intvec.size(); ++i) {
host_to_big(buffer.data() + sizeof(int)*i, intvec[i]);
}
benchmark::DoNotOptimize(buffer);
benchmark::ClobberMemory();
}
}
BENCHMARK(solution1);
static void solution2(benchmark::State& state) {
std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());
for (auto _ : state) {
for (std::size_t i{}; i < intvec.size(); ++i) {
buffer[sizeof(int)*i] = host_to_net(intvec[i]);
}
benchmark::DoNotOptimize(buffer);
benchmark::ClobberMemory();
}
}
BENCHMARK(solution2);
Upvotes: 0
Reputation: 238311
compile time-enabled solution.
Consider whether this is useful requirement in the first place. The program isn't going to be communicating with another system at compile time. What is the case where you would need to use the serialised integer in a compile time constant context?
- Starting at what C++ standard is there a portable standard-compliant way of performing host to network byte order conversion?
It's possible to write such function in standard C++ since C++98. That said, later standards bring tasty template goodies that make this nicer.
There isn't such function in the standard library as of the latest standard.
- Should such a pure-c++ solution be preferred to OS specific functions such as, e.g., POSIX-htonl? (I think yes)
Advantage of POSIX is that it's less important to write tests to make sure that it works correctly.
Advantage of pure C++ function is that you don't need platform specific alternatives to those that don't conform to POSIX.
Also, the POSIX htonX are only for 16 bit and 32 bit integers. You could instead use htobeXX functions instead that are in some *BSD and in Linux (glibc).
Here is what I have been using since C+17. Some notes beforehand:
Since endianness conversion is always1 for purposes of serialisation, I write the result directly into a buffer. When converting to host endianness, I read from a buffer.
I don't use CHAR_BIT
because network doesn't know my byte size anyway. Network byte is an octet, and if your CPU is different, then these functions won't work. Correct handling of non-octet byte is possible but unnecessary work unless you need to support network communication on such system. Adding an assert might be a good idea.
I prefer to call it big endian rather than "network" endian. There's a chance that a reader isn't aware of the convention that de-facto endianness of network is big.
Instead of checking "if native endianness is X, do Y else do Z", I prefer to write a function that works with all native endianness. This can be done with bit shifts.
Yeah, it's constexpr. Not because it needs to be, but just because it can be. I haven't been able to produce an example where dropping constexpr would produce worse code.
// helper to promote an integer type
template <class T>
using promote_t = std::decay_t<decltype(+std::declval<T>())>;
template <class T, std::size_t... I>
constexpr void
host_to_big_impl(
unsigned char* buf,
T t,
[[maybe_unused]] std::index_sequence<I...>) noexcept
{
using U = std::make_unsigned_t<promote_t<T>>;
constexpr U lastI = sizeof(T) - 1u;
constexpr U bits = 8u;
U u = t;
( (buf[I] = u >> ((lastI - I) * bits)), ... );
}
template <class T, std::size_t... I>
constexpr void
host_to_big(unsigned char* buf, T t) noexcept
{
using Indices = std::make_index_sequence<sizeof(T)>;
return host_to_big_impl<T>(buf, t, Indices{});
}
1 In all use cases I've encountered. Conversions from integer to integer can be implemented by delegating these if you have such case, although they cannot be constexpr due to need for reinterpret_cast.
Upvotes: 4