Reputation: 131445

Handing type-erased data at runtime - how not to reinvent the wheel?

I'm working on some code which gets data that looks like this:

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP };
struct buffer {
    data_type element_type;
    size_t    size; // in elements of element_type, not bytes
    void*     data;
}

(this is simplified; in actuality there are quite a few more types, more fields in this struct etc.)

Now, I find myself writing a bunch of utility code to "convert" enum values to actual types and vice-versa, at compile time. Then I realize I need to do some of that I need to do the same at run-time as well, and with a variable number of buffers... so now, in addition to type-traits-based lookup of values and enum-template-parameter-based lookup of types - I'm writing code which looks up std::type_infos. It's kind of a mess.

But really - I should not be doing this. It's repetitive and I am absolutely sure I'm reinventing the wheel - implementing something which has already been written many times already: Compilers, DBMSes, data file parsers, serialization libraries and so on.

What can I do to minimize my wasted effort on this endeavor?

Notes:

I get these buffers at run time, and cannot just un-erase the type at compile time (e.g. using a type_traits).
I can't change the API. Or rather, I could change whatever I wanted in my code, but I still get data in this layout in memory.
I don't just take such buffers as input, I also need to produce them as output.
I occasionally need to handle many buffers of different at once - even a variable number of them (e.g. foo(buffer* buffers, int num_buffers);.
C++11 solutions are preferred over newer-standard-version ones.
I actually use gsl a lot, so you can use it in your answers if you like. As for Boost - that may be politically difficult to depend on, but for the purposes of a StackOverflow question, it's fine, I guess.

Upvotes: 0

Answers (4)

Yakk - Adam Nevraumont

Reputation: 275270

Use boost::variant and gsl::span.

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP };
struct buffer {
  data_type element_type;
  size_t    size; // in elements of element_type, not bytes
  void*     data;
};

template<class...Ts>
using var_span = boost::variant< gsl::span< Ts > ... >;

using buffer_span = var_span< std::int16_t, std::int32_t, std::uint64_t, float, ??? >;

buffer_span to_span( buffer buff ) {
  switch (buff.element_type) {
    case INT16: return gsl::span<std::int16_t>( (std::int16_t*)buff.data, buff.size );
    // etc
  }
}

now you can

auto span = to_span( buff );

and then visit the span to type-safe access the buffer of data.

Writing visitors is less painful in c++14 due to [](auto&&) lambdas, but doable in c++11.

Writing template<class...Fs> struct overloaded can also make it easier to write visitors. There are a myriad of implementations out there.

If you cannot use boost you can convert to_span to visit_span and have it take a visitor.

If you cannot use gsl, writing your own span is trivial.

visit_span( buff, overload(
  [](span<int16_t> span) { /* code */ },
  [](span<int32_t> span) { /* code */ },
  // ...
 ));

 struct do_foo {
   template<class T>
   void operator()(span<T> span) { /* code */ }
 };
 visit_span( buff, do_foo{captures} );

Upvotes: 1

Passer By

Reputation: 21131

how not to reinvent the wheel?

Simply, use std::variant along with conversions back and forth. It's in the standard library for a reason.

On to reinventing the wheel, visiting is the simplest generic mechanism to handle type-erased data

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP, size };

template<data_type d>
struct data
{
    using type = void;
};
template<>
struct data<INT16>
{
    using type = int16_t;
};
// and so on

template<data_type d>
using data_t = typename data<d>::type;


template<typename F, typename T>
void indirect(void* f, void* t, int n)
{
    (*(F*)f)((T*)t, n);
}

template<typename F, size_t... Is>
void visit_(F&& f, buffer* bufs, int n, std::index_sequence<Is...>)
{
    using rF = typename std::remove_reference<F>::type;
    using f_t = void(*)(void*, void*, int);
    static constexpr f_t fs[] = {indirect<rF, data_t<data_type(Is)>>...};
    for(int i = 0; i < n; i++)
        fs[bufs[i].element_type](&f, bufs[i].data, bufs[i].size);
}

template<typename F>
void visit(F&& f, buffer* bufs, int n)
{
    visit_(std::forward<F>(f), bufs, n, std::make_index_sequence<data_type::size>{});
}

std::index_sequence and friends can be implemented relatively easily in C++11. Use as

struct printer
{
    template<typename T>
    void operator()(T* t, int n)
    {
        for(int i = 0; i < n; i++)
            std::cout << t[i] << ' ';
        std::cout << '\n';
    }
};

void foo()
{
    visit(printer{}, nullptr, 0);
}

Upvotes: 2

Matthieu Brucher

Reputation: 22023

This seems to be what type_traits are used for (https://en.cppreference.com/w/cpp/types).

Basically, you define a templated structure, by default it's empty, and you specialize it for each enum you have. Then in your code you use MyTypeTraits<MyEnumValue>::type to get the type associated to the enum you want.

And everything is defined at compile time. If you need runtime information, you can always do some dispatch based on the value of the template (for instance if you store the enum as well).

Upvotes: 1

Max Langhof

Reputation: 23681

The goal here should be to get back into the C++ type system as fast as possible. To do this, there should be one central function that switches based on the (runtime) data_type and then hands off each case to a (compile-time) template version.

You have not indicated how the associated functions look like, but here is an example:

template<typename T>
struct TypedBuffer
{
  TypedBuffer(void* data, size_t elementCount) { /* ... */ }
  // ...
};

template<typename T>
void handleBufferTyped(void* data, size_t elementCount)
{
  TypedBuffer<T> buf(data, elementCount);
  // Do whatever you want - you're back in the type system.
}

void handleBuffer(buffer buf)
{
  switch (buf.element_type)
  {
  case INT16:     handleBufferTyped<int16_t>(buf.data, buf.size); break;
  case INT32:     handleBufferTyped<int32_t>(buf.data, buf.size); break;
  case UINT64:    handleBufferTyped<uint64_t>(buf.data, buf.size); break;
  case FLOAT:     handleBufferTyped<float>(buf.data, buf.size); break;
  case TIMESTAMP: handleBufferTyped<std::time_t>(buf.data, buf.size); break;
  }
}

If needed, you can also have TypedBuffer inherit from a non-templated base class so you can return from handleBuffer polymorphically, but that's mixing a lot of paradigms and probably unnecessary.

Upvotes: 4

Handing type-erased data at runtime - how not to reinvent the wheel?

Answers (4)

Related Questions