Using typeid for simple decisions

Question

My question concerns simple catching of casting problems at runtime in C++. I understand that C++ provides no 'RTTI' under most circumstances (let's just say I can't change my compiler settings and turn it on, for argument's sake.).

In my project, I have a container provided to me by an ancient library. Let's call that Frodo and the container Bane.

Frodo's Bane is a container I must interact with directly. It has been poorly implemented and contains several maps of the same information which I have to manage separately. The container contains multiple instances of a certain type, call it Ring.

For my implementation, I have to use a subtype of Ring, the OneRing, for all other operations. This is due to the external requirements for my particular implementation of a Frodo's Bane.

I must sometimes look through my lists and let's just say that there are things I need from a OneRing that I can't get from just any old Ring.

There is a CHANCE, just an occasional CHANCE, that Frodo will put a normal old Ring into my Bane.

Being far too Pepsi gen to have realised that C++ was incapable of answering a simple question like 'is this Ring also a OneRing' or to catch exceptions arising from mistaking a Ring for a OneRing, I did a bit of Googling and found that there are things like typeid and type_of, but that there are lots of reasons not to use each.

So, limiting the problem right down to simply wanting to avoid mistaking a Ring for a OneRing, is there ANYTHING I can safely and efficiently do, WITHOUT being able to change the implementation of Ring.

Thanks heaps for any advice!

UPDATE: Ok there is plenty of good advice below but after a day of experimentation I can say the following things about RTTI in native Visual Studio 2008 C++:

C++ only has RTTI on pointers to classes. You can't dynamic cast a void* and if you ask for the type info it will be void*. The RTTI is 'on' the pointer. I understand why you don't 'cast' a member (from C++'s crazy viewpoint, in languages without members you don't have to make such an artificial distinction), and perhaps these facts are related for some archaic efficiency/compiler complexity reason.
If you get the type info of a Ring* to a OneRing, it will be Ring*. Why they chose not to put the information on the classes I have no idea. This makes typeid() essentially useless - it even returns a private class with only one member you can access, a char* name! That's like 11/10 on the unfriendliness and uselessness scale - did they do it on purpose?
You can cast down the inheritance tree safely with a dynamic cast but again you can't do this unless you have and know a common inheritance ancestor for any data you receive.
If you fall outside the above scenario, or you don't have RTTI for whatever reason, you are up the proverbial creek, because there is no way to catch an exception or generate an error resulting from an incorrect casting unless you implement some form of RTTI yourself.
Things like COM seek to give more useful RTTI by giving everything a common base that has some predeclared RTTI information on it... I think. :-)

As a side note, Microsoft states that (at least since 2008) RTTI is a part of C++ and is on by default. It is unbelievable how much conflicting information on this alone I found.

This may be old news to those of you who have spent most of your time in Native Land, but for me this took a session of experimenting, some Googling, some posting on StackOverflow, some more Googling and some more experimentation.

Thanks to everyone who pointed me in the right direction, and please do correct me if you still feel I've missed something. :-)

Matteo Italia · Accepted Answer

If the container contains instances of Ring and not pointers to them you can't have a real OneRing, but only a sliced one put in a Ring. In that case, you can't know for sure what the object was before being inserted into the container, since now it's a normal Ring for every pratical purpose.

If, instead, the container stores pointers/references to the elements (as I suppose), you can simply try a dynamic_cast of the Ring pointer/reference you get to the type OneRing; if the dynamic_cast succeeds you have your pointer/reference correctly casted to its "real" type OneRing, otherwise you know that it's not a OneRing but just a regular Ring (or some other derived class that isn't OneRing).

Obviously dynamic_cast requires RTTI to work correctly (if RTTI is disabled no type information is included into the executable, so nothing is known about types at runtime), so remember to enable it if your compiler disables it by default.

If you can't turn it on, you can't use dynamic_cast or typeid; there's no "alternative" mechanism built in the language, because that would be redundant, and it would need the same information needed by RTTI to work correctly.

The only thing you could do is to reinvent RTTI in your class hierarchy, providing a virtual method overridden to return a different value by each subclass; that way you could test from a Ring pointer what kind of Ring it really is, then you could brutally cast it. Notice that this is cumbersome, difficult to maintain and not safe (the brutal pointer casting may not work in complicated inheritance scenarios), so I strongly advise against using this method. Just use RTTI.

By the way, keep in mind that in many occasions if you need a dynamic_cast there's something wrong in your class hierarchy: well designed class hierarchies tend to achieve their goals without dynamic casting, by using correctly virtual methods.

My question concerns simple catching of casting problems at runtime in C++. I understand that C++ provides no 'RTTI' under most circumstances (let's just say I can't change my compiler settings and turn it on, for argument's sake.).

The C++ standard provides RTTI, you can disable it in compilers if you don't need it to save some executable space that would be wasted for unnecessary type information. If you instead need RTTI it's not clear why you would want to disable it.

Being far too Pepsi gen to have realised that C++ was incapable of answering a simple question like 'is this Ring also a OneRing' or to catch exceptions arising from mistaking a Ring for a OneRing

That's exactly the purpose of dynamic_cast. Apply it to a pointer, and if it can't be converted you get a NULL, apply it to a reference and if it can't be converted a std::bad_cast is thrown.

Addendum

C++ only has RTTI on pointers to classes. You can't dynamic cast a void* and if you ask for the type info it will be void*. The RTTI is 'on' the pointer. I understand why you don't 'cast' a member (from C++'s crazy viewpoint, in languages without members you don't have to make such an artificial distinction), and perhaps these facts are related for some archaic efficiency/compiler complexity reason.

Wait, wait, it doesn't work like that, the RTTI information is not on the pointer, it's somehow on the object. But to understand why it's such a mess I think you need a bit of information about how this stuff works under the cover.

Notice: all the stuff that follows is implementation-specific, the standard do not mandate any particular way to implement virtual functions and RTTI, it's just usually done this way.

In C++ a non-polymorphic class is just a normal structure containing its fields. The private/public distinction is enforced at compile time, and the methods are just normal functions that are called with a hidden this parameter that points to the instance of the class on which they are called. Every method call is resolved at compile time. Inheritance is just a matter of pasting the base class fields before the added fields. Everything is simple, everybody is happy.

If the static type is always the same as the "real" object type, everything works fine. Problems start to arise when you want to have polymorphic behavior: if you store a Derived * into a pointer variable whose static type is Base *, the "static" type and the "real" type are no longer the same. When seeing a call performed on such pointer, the compiler doesn't have any clue about having to call the method of Derived, and just calls the method of Base, since all the information it has is the static type of such pointer.

To solve this problem, virtual calls and vtables were invented. When you declare a class method as virtual, your class becomes a polymorphic class, i.e. it allows polymorphic behavior on methods declared as virtual.

It works like this. For each class, the compiler creates a table of function pointers (the so-called "vtable"), that is put somewhere in the executable image; there's one "row" for each virtual method. The vtable of the base class will contain pointers to the Base implementation of the virtual methods, while the derived classes will have their vtable, containing the pointers to their versions of the methods. vtables of derived classes may also be bigger because of additional methods, but the methods present also in the base class are at the beginning, and the indexes for common methods in Derived vtable match the ones of Base.

Now, each object of a polymorphic class have an additional member, the virtual table pointer (usually called vptr, usually put at the beginning of the class). This member is automagically initialized just before each constructor is run to point to the correct vtable for the object. Notice that this is the reason why, when in a polymorphic type the base class constructor is run, the virtual methods do not work "correctly" (i.e. they work as if the class were of Base type): the derived constructor hasn't run yet, so the vptr still points to the Base vtable.

When the object is completely constructed, its "real" type is now much more determined than for a non-polymorphic object: if you have a Derived * stored in a Base *, the compiler will now be able to call the correct versions of virtual methods: each virtual call is implemented as a lookup in the virtual table (which can be accessed, since vptr is present both in Base and in Derived ad the same position), followed by a call to the function whose pointer is stored at the correct location of the vtable (the index for a particular virtual function is the same in the vtable of every derived class). Since the called function "knows" that it's operating on a Derived instance, it can then access all the members of the Derived class.

This is more or less how virtual functions work for single inheritance in a "simple" polymorphic class hierarchy. With multiple inheritance or when mixing polymorphic classes with non-polymorphic ones things start to get messy (you can have multiple vptrs, the class layout must follow some constraints, ...), but this is not relevant for us.

Now we have virtual functions working. typeid is just a small step from here.

RTTI needs to keep track of some information associated with the "real" type of each object. So, since we already have vtables, a sensible idea is to put such information (probably as a pointer to a bigger structure) somewhere in the vtable - for example at the beginning or at the end. This way, you don't have to add yet another hidden pointer inside each object - the one of the vtable suffices both for virtual calls and for type information.

Each polymorphic class will have an associated structure containing its type information, that may include a name, some unique identifier (may not be needed if the name is guaranteed to be unique) and probably pointers to the type information of the base classes (these will be needed for dynamic_cast).

When you call typeid on a class, you get an instance of type_info, an opaque class that embeds a little part of this information: namely, the class name and the hidden unique identifier (actually, often it will contain only one of these: if the compiler is friendly enough to provide a name, it will probably make sure it's unique to re-use it also as unique identifier).

The name member is near-useless for non-debugging purposes. The real usefulness of type_info is the operator== that it provides; in facts, the main purpose of type_info is that it can be compared, and two type_info objects are evaluated equal if and only if they are the result of typeid on identical types. So, the main use of typeid is to check if the real type of a pointer/reference to an object is exactly the same as another one (or of a type fixed at compile-time). The comparison is usually done comparing the unique identifier (probably the pointer to the RTTI structure in memory) or the unique name.

dynamic_cast, instead, is a much more complicated beast. The trivial case (casting from a derived class pointer to a base class pointer) is, well, trivial, no runtime check is involved, and it's the only case where it works fine on non-polymorphic classes (which do not have RTTI available).

The second-easiest case is when you try to cast a Base * whose "real" type is Derived * to Derived *. This is just a matter of comparing the RTTI pointer of Derived * with the one linked to the object to be cast; if they match, the cast succeeds.

The general case, instead, is much more complicated. dynamic_cast has to check the class hierarchy of the pointed object following some rules, that ensure that no meaningless casts are performed (those rules are specified in the C++ standard, §5.2.7 ¶8). This runtime check is performed using the type information described above. If the cast succeeds, you get a properly casted pointer/reference, otherwise you get a NULL (if it's a pointer) or a bad_cast exception (if it's a reference).

What about non-polymorphic types? Well, they are more or less left out from RTTI: typeid will return their static type information (which usually is useless), dynamic_cast will only allow upcasting (which can be checked at compile time), downcasting will fail at compile-time.

The rationale of this decision was probably not to make "complicated" types that are supposed to be simple: the extension of RTTI to every type using a hidden pointer would have made impossible to have POD types ("Plain Old Data" i.e. stuff along the lines of traditional C structs).

Also, after all, if you don't have virtual methods, you probably don't need dynamic_cast and type_id: if you have object hierarchies that are manipulated by pointer to base classes you must have at least a virtual destructor, otherwise nasty stuff can happen (if you delete a Derived referring to it via a Base * and the destructor is non-virtual, you'll call only the destructor of Base but not the one defined in Derived - definitely not nice).

Now your other comments can be answered easily:

If you get the type info of a Ring* to a OneRing, it will be Ring*. Why they chose not to put the information on the classes I have no idea. This makes typeid() essentially useless - it even returns a private class with only one member you can access, a char* name! That's like 11/10 on the unfriendliness and uselessness scale - did they do it on purpose?

You get Ring * only if Ring is not polymorphic. You must make at least one member of Ring polymorphic for RTTI to work (and, as specified, probably at least the destructor of Ring must be virtual). The usefulness of the type returned by typeid is in that it can be compared in equality with other type_info - the name member is useful only for debugging purposes (actually, as far as the standard is concerned it could always return an empty string).

You can cast down the inheritance tree safely with a dynamic cast but again you can't do this unless you have and know a common inheritance ancestor for any data you receive.

This is required, because if you have a generic void * the compiler can't know where the vptr is - heck, it doesn't even know if that thing has a vptr, it could simply be an int *! :-)

In other words, the compiler must have some minimum static type information to be able to tell if there's any type information available at runtime to perform the decision (if the class is not polymorphic or it's not even a class the cast fails at compile time) and where this information is (i.e. where the vptr is located in the object - information that can be deduced from the static type of any ancestor).

If you fall outside the above scenario, or you don't have RTTI for whatever reason, you are up the proverbial creek, because there is no way to catch an exception or generate an error resulting from an incorrect casting unless you implement some form of RTTI yourself.

That's how the language work; other implementation choices of RTTI (e.g. tracking the type of each pointer) would have been much more complicated (if possible) and of little added usefulness. If you want extensive reflection capabilities, C++ is not the way to go, you have to use managed languages.

Little example to summarize:

#include 
#include 

using namespace std;

class Base_NoPolymorphic
{
public:
    void Name()
    {
        cout<<"Base_NoPolymorphic"<Name();
    cout<Name();
    // This does not compile, because you're trying a cast down the class hierachy
    // on a non-polymorphic type
    // cout<(dnpp)<Name();
    cout<Name();
    // This will succeed (output != 0)
    cout<(dpp)<(ptr)<



Output on my machine (g++ 4.5):

P18Base_NoPolymorphic - Base_NoPolymorphic
P18Base_NoPolymorphic - Base_NoPolymorphic
P16Base_Polymorphic - Base_Polymorphic
P16Base_Polymorphic - Derived_Polymorphic
0x7fffd7ad6890
Pv


which is as expected: typeid on non-polymorphic types returns only the static type information (Pv is the g++ RTTI "name" for void *), while works fine on polymorphic types; the same holds for virtual/nonvirtual method calls via base class pointer. dynamic_cast succeeds on polymorphic types, and, if you uncomment the other dynamic_cast, you'll see that it fails to compile.

Using typeid for simple decisions

Answers (2)

Addendum

Related Questions