Timothy Baldridge
Timothy Baldridge

Reputation: 10683

Tagged unions with runtime defined members

I'm working on a small interpreter and I'd like to represent some types on the stack with others being pointers. Here's what it would look like in C++:

enum {
  NIL_TYPE,
  INT_TYPE,
  REF_TYPE_START,
}

union Data 
{
  int int_val;
  void *obj_val
}

struct Object
{
  size_t _type_id;
  Data _data;
}

_type_id acts as a tag for the rest of the struct. Things like integers, boolean, nils, etc, can be passed on the stack while larger things like strings and objects can be passed by reference.

The interpreter will create new types at runtime, which is what the REF_START_TYPE is for. When a new type is created, we'll add a value to some internal counter and that becomes the next type id and that type is expected to be a pointer.

How can I represent something like this in Rust? Enum types seem awesome, but they don't seem to allow extension. Untagged unions seem to be very much a WIP and not much help. Is there any way I can get this sort of on-stack behavior (thereby reducing a ton of allocations during math operations), while still allowing for runtime extension?

Upvotes: 1

Views: 322

Answers (1)

Shepmaster
Shepmaster

Reputation: 431689

It sounds like you want something like

enum Object {
    Nil,
    Int(i32),
    Runtime(TypeId, RuntimeType),
}

You could ensure that RuntimeType contains only a pointer or choose to box it immediately (Runtime(TypeId, Box<RuntimeType>),), but have the same end result.

If it contains a Box, this struct takes up 24 bytes on a 64-bit machine. Unfortunately, there's no way I'm aware of to inform the compiler that the TypeId and the enum's discriminant should inhabit the same location. You could instead choose to move the TypeId into the Box<RuntimeType> if your measurements show that the dereference is less bad than the extra stack size. This is all very malleable depending on what other types you embed directly into the enum. For example, a Vec is 3-pointers worth of stack space. If that were included, you could get away with inlining more values.

The trick becomes: what is RuntimeType? You haven't described the problem enough for me to guess. It could be a concrete type, or it might end up being a boxed trait object.

A slightly more complete example:

struct RuntimeType;
type TypeId = u64;

enum Object {
    Nil,
    Int(i32),
    Runtime(TypeId, RuntimeType),
}

impl Object {
    fn type_id(&self) -> TypeId {
        use Object::*;

        match *self {
            Nil => 0,
            Int(..) => 1,
            Runtime(id, ..) => id,
        }
    }
}

fn main() {}

Upvotes: 3

Related Questions