c++c++11typesprogramming-languagesvalgrind

Reputation: 2135

C++ and dynamically typed languages

Today I talked to a friend about the differences between statically and dynamically typed languages (more info about the difference between static and dynamic typed languages in this SO question). After that, I was wondering what kind of trick can be used in C++ to emulate such dynamic behavior.

In C++, as in other statically typed languages, the variable type is specified at compile time. For example, let's say I have to read from a file a big amount of numbers, which are in the majority of the cases quite small, small enough to fit in an unsigned short type. Here comes the tricky thing, a small amount of these values are much bigger, bigger enough to need an unsigned long long to be stored.

Since I assume I'm going to do calculations with all of them I want all of them stored in the same container in consecutive positions of memory in the same order than I read them from the input file.. The naive approach would be to store them in a vector of type unsigned long long, but this means having typically up to 4 times extra space of what is actually needed (unsigned short 2 bytes, unsigned long long 8 bytes).

In dynamically typed languages, the type of a variable is interpreted at runtime and coerced to a type where it fits. How can I achieve something similar in C++?

My first idea is to do that by pointers, depending on its size I will store the number with the appropriate type. This has the obvious drawback of having to also store the pointer, but since I assume I'm going to store them in the heap anyway, I don't think it matters.

I'm totally sure that many of you can give me way better solutions than this ...

#include <iostream>
#include <vector>
#include <limits>
#include <sstream>
#include <fstream>

int main() {
    std::ifstream f ("input_file");
    if (f.is_open()) {
        std::vector<void*> v;
        unsigned long long int num;
        while(f >> num) {
            if (num > std::numeric_limits<unsigned short>::max()) {
                v.push_back(new unsigned long long int(num));
            }
            else {
                v.push_back(new unsigned short(num));
            }
        }
        for (auto i: v) {
            delete i;
        }
    f.close();
    }
}

Edit 1: The question is not about saving memory, I know in dynamically typed languages the necessary space to store the numbers in the example is going to be way more than in C++, but the question is not about that, it's about emulating a dynamically typed language with some c++ mechanism.

Upvotes: 1

Answers (4)

ysdx

Reputation: 9335

You could create a class for storing dynamic values:

enum class dyn_type {
  none_type,
  integer_type,
  fp_type,
  string_type,
  boolean_type,
  array_type,
  // ...
};

class dyn {
  dyn_type type_ = dyn_type::none_type;
  // Unrestricted union:
  union {
    std::int64_t integer_value_;
    double fp_value_;
    std::string string_value_;
    bool boolean_value_;
    std::vector<dyn> array_value_;
  };
public:
  // Constructors
  dyn()
  {
     type_ = dyn_type::none_type;
  }
  dyn(std::nullptr_t) : dyn() {}
  dyn(bool value)
  {
    type_ = dyn_type::boolean_type;
     boolean_value_ = value;
  }
  dyn(std::int32_t value)
  {
    type_ = dyn_type::integer_type;
     integer_value_ = value;
  }
  dyn(std::int64_t value)
  {
     type_ = dyn_type::integer_type;
     integer_value_ = value;
  }
  dyn(double value)
  {
     type_ = dyn_type::fp_type;
     fp_value_ = value;
  }
  dyn(const char* value)
  {
     type_ = dyn_type::string_type;
     new (&string_value_) std::string(value);
  }
  dyn(std::string const& value)
  {
     type_ = dyn_type::string_type;
     new (&string_value_) std::string(value);
  }
  dyn(std::string&& value)
  {
     type_ = dyn_type::string_type;
     new (&string_value_) std::string(std::move(value));
  }
  // ....

  // Clear
  void clear()
  {
     switch(type_) {
     case dyn_type::string_type:
       string_value_.std::string::~string();
       break;
     //...
     }
     type_ = dyn_type::none_type;
  }
  ~dyn()
  {
    this->clear();
  }

  // Copy:
  dyn(dyn const&);
  dyn& operator=(dyn const&);

  // Move:
  dyn(dyn&&);
  dyn& operator=(dyn&&);

  // Assign:
  dyn& operator=(std::nullptr_t);
  dyn& operator=(std::int64_t);
  dyn& operator=(double);
  dyn& operator=(bool);   

  // Operators:
  dyn operator+(dyn const&) const;
  dyn& operator+=(dyn const&);
  // ...

  // Query
  dyn_type type() const { return type_; }
  std::string& string_value()
  {
     assert(type_ == dyn_type::string_type);
     return string_value_;
  }
  // ....

  // Conversion
  explicit operator bool() const
  {
    switch(type_) {
    case dyn_type::none_type:
      return true;
    case dyn_type::integer_type:
      return integer_value_ != 0;
    case dyn_type::fp_type:
      return fp_value_ != 0.0;
    case dyn_type::boolean_type:
      return boolean_value_;
    // ...
    }
  }
  // ...
};

Used with:

std::vector<dyn> xs;
xs.push_back(3);
xs.push_back(2.0);
xs.push_back("foo");
xs.push_back(false);

Upvotes: 1

Tony Delroy

Reputation: 106236

Options include...

Discriminated union

The code specifies a set of distinct, supported types T0, T1, T2, T3..., and - conceptually - creates a management type to

struct X
{
    enum { F0, F1, F2, F3... } type_;
    union { T0 t0_; T1 t1_; T2 t2_; T3 t3_; ... };
};

Because there are limitations on the types that can be placed into unions, and if they're bypassed using placement-new care needs to be taken to ensure adequate alignment and correct destructor invocation, a generalised implementation becomes more complicated, and it's normally better to use boost::variant<>. Note that the type_ field requires some space, the union will be at least as large as the largest of sizeof t0_, sizeof t1_..., and padding may be required.

std::type_info

It's also possible to have a templated constructor and assignment operator that call typeid and record the std::type_info, allowing future operations like "recover-the-value-if-it's-of-a-specific-type". The easiest way to pick up this behaviour is to use boost::any.

Run-time polymorphism

You can create a base type with virtual destructor and whatever functions you need (e.g. virtual void output(std::ostream&)), then derive a class for each of short and long long. Store pointers to the base class.

Custom solutions

In your particular scenario, you've only got a few large numbers: you could do something like reserve one of the short values to be a sentinel indicating that the actual value at this position can be recreated by bitwise shifting and ORing of the following 4 values. For example...

10 299 32767 0 0 192 3929 38

...could encode:

10
299
// 32767 is a sentinel indicating next 4 values encode long long
(0 << 48) + (0 << 32) + (192 << 16) + 3929
38

The concept here is similar to UTF-8 encoding for international character sets. This will be very space efficient, but it suits forward iteration, not random access indexing a la [123].

Upvotes: 6

Mark B

Reputation: 96301

The normal way of dynamic typing in C++ is a boost::variant or a boost::any.

But in many cases you don't want to do that. C++ is a great statically typed language and it's just not your best use case to try to force it to be dynamically typed (especially not to save memory use). Use an actual dynamically typed language instead as it is very likely better optimized (and easier to read) for that use case.

Upvotes: -1

Cheers and hth. - Alf

Reputation: 145429

An easy way to get dynamic language behavior in C++ is to use a dynamic language engine, e.g. for Javascript.

Or, for example, the Boost library provides an interface to Python.

Possibly that will deal with a collection of numbers in a more efficient way than you could do yourself, but still it's extremely inefficient compared to just using an appropriate single common type in C++.

Upvotes: 0