parapura rajkumar
parapura rajkumar

Reputation: 24403

Problems reading std::string from std::stringstream in C++/CLI code

I have a piece of code to read the build date and month like so. Here __DATE__ is as defined in Predefined Macros

const char* BUILD_DATE = __DATE__;
std::stringstream ss(BUILD_DATE);
std::string month;
size_t year;
ss >> month;
ss >> year;
ss >> year;

char buffer[1024];
sprintf(buffer, "Read year=[%d] month=[%s] date=[%s]", year,month.c_str(),BUILD_DATE);

When it works correcty the buffer is usually

Read year=[2013] month=[Mar] date=[Mar 9 2013]

But on some runs it is

Read year=[0] month=[M] date=[Mar 9 2013]

or

Read year=[2013] month=[Mar ] date=[Mar 9 2013]

Basically the year is either 0 or the month has an extra space.

The project is an x64/CLR build using Microsoft Visual Studio 2010 SP1 on Windows 7 box.

I am stumped as to why this happens occasionally. Am I using stringstream incorrectly ?

Upvotes: 0

Views: 1704

Answers (2)

parapura rajkumar
parapura rajkumar

Reputation: 24403

I was initially tempted to just delete the question but I thought I would share my findings in case some other poor soul encounters the same issue. The problem was very mysterious, never occurred on multiples runs of my application and only occurred while testing and never while debugging a test.

This innocent looking function

const char* BUILD_DATE = __DATE__;
std::stringstream ss(BUILD_DATE);
std::string month;
size_t year;
ss >> month;
ss >> year;
ss >> year;

was implemented in a C++/CLI dll. Before I can dwell into the details let me explain how the stringstream reads month and year here. To figure out how many characters constitute the month variable ss >> month needs to break the ss string buffer by space. The way it does it by using the current locale and in particular a facet of it called ctype. The ctype facet has function called ctype::is which can tell whether a character is space or not. In a well behaved C++ application everything just works according to standard. Now lets assume for some reason the ctype facet gets corrupted. Viola, operator >> cannot determine what is a space and what isn't and cannot parse correctly. This was precisely what was happening in my case and below are the details.

The rest of the answer only applies to the std c++ library as provided by Visual Studio 2010 and how it operates under C++/CLI.

Consider some code like so

struct Foo
{
    Foo()
    {
        x = 42;
    }

    ~Foo()
    {
        x = 45;
    }
    int x;
};

Foo myglobal;

void SimpleFunction()
{
    int myx = myglobal.x;
}

int main()
{
    SimpleFunction();

    return 0;
}

Here myglobal is what you call a object with static storage duration guaranteed to be initialized before main is entered and in SimpleFunction you will always see myx as 42. The lifetime of myglobal is what we typically call per-process as in it is valid for the lifetime of the problem. The Foo::~Foo destructor will only run after main has returned.

Enter C++/CLI and AppDomain

AppDomain as per msdn provides you with isolated environment where applications execute. For C++/CLI it introduces the notion of objects of what I would call appdomain storage duration

 __declspec(appdomain)   Foo myglobal;

So if you changed the definition of myglobal like above you can potentially have myglobal.x be a different value in different appdomains kind of like thread local storage. So regular C++ objects of static duration are initialized/cleaned during init/exit of your program. I am using init/exit/cleaned very loosely here, but you get the idea. Object of appdomain storage are initialized/cleaned during load/unload of the AppDomain.

A typical managed program only uses the default AppDomain so per-process / per-appdomain storage is pretty much the same.

In C++ the static initialization order fiasco is a very common mistake where objects of static storage duration during their initialization refer to other objects of static storage duration that may have not been initialized.

Now consider what happens when a per-process variable refers to per-domain variable. Basically after the AppDomain is unloaded the per-process variable will refer to junk memory. For those of you who are wondering what it has to do with the original problem, please bear with me just a tad more.

Visual studio use_facet implementation

std::use_facet is used to get a facet of interest from the locale. It is used by operator << to get the ctype facet. It is defined as

template <class Facet> const Facet& use_facet ( const locale& loc );

Notice it returns a reference to Facet. The way it is implemented by VC is

    const _Facet& __CRTDECL use_facet(const locale& _Loc)

    {   // get facet reference from locale
    _BEGIN_LOCK(_LOCK_LOCALE)   // the thread lock, make get atomic
        const locale::facet *_Psave =
            _Facetptr<_Facet>::_Psave;  // static pointer to lazy facet

        size_t _Id = _Facet::id;
        const locale::facet *_Pf = _Loc._Getfacet(_Id);

        if (_Pf != 0)
            ;   // got facet from locale
        else if (_Psave != 0)
            _Pf = _Psave;   // lazy facet already allocated
        else if (_Facet::_Getcat(&_Psave, &_Loc) == (size_t)(-1))

 #if _HAS_EXCEPTIONS

            _THROW_NCEE(bad_cast, _EMPTY_ARGUMENT); // lazy disallowed

 #else /* _HAS_EXCEPTIONS */
            abort();    // lazy disallowed
 #endif /* _HAS_EXCEPTIONS */

        else
            {   // queue up lazy facet for destruction
            _Pf = _Psave;
            _Facetptr<_Facet>::_Psave = _Psave;

            locale::facet *_Pfmod = (_Facet *)_Psave;
            _Pfmod->_Incref();
            _Pfmod->_Register();
            }

        return ((const _Facet&)(*_Pf)); // should be dynamic_cast
    _END_LOCK()
    }

What is happening here is we ask the locale for the facet of interest and store it in

   template<class _Facet>
    struct _Facetptr
    {   // store pointer to lazy facet for use_facet
    __PURE_APPDOMAIN_GLOBAL static const locale::facet *_Psave;
    };

the local cache _Psave so that subsequent calls to get the same facet are faster. The caller of use_facet is not responsible for the returned facet life management so how are these facets cleaned up. The secret is the last part of code with comment queue up lazy facet for destruction. The _Pfmod->_Register() eventually calls this

__PURE_APPDOMAIN_GLOBAL static _Fac_node *_Fac_head = 0;

static void __CLRCALL_OR_CDECL _Fac_tidy()
{   // destroy lazy facets
    _BEGIN_LOCK(_LOCK_LOCALE)   // prevent double delete
        for (; std::_Fac_head != 0; )
        {   // destroy a lazy facet node
            std::_Fac_node *nodeptr = std::_Fac_head;
            std::_Fac_head = nodeptr->_Next;
            _DELETE_CRT(nodeptr);
        }
        _END_LOCK()
}



struct _Fac_tidy_reg_t { ~_Fac_tidy_reg_t() { ::_Fac_tidy(); } };
_AGLOBAL const _Fac_tidy_reg_t _Fac_tidy_reg;


void __CLRCALL_OR_CDECL locale::facet::_Facet_Register(locale::facet *_This)
{   // queue up lazy facet for destruction
    _Fac_head = _NEW_CRT _Fac_node(_Fac_head, _This);
}

Pretty smart right. Adds all the new'd facets to a linked list and clean them all using a static object destructor. Except there is a slight problem. _Fac_tidy_reg is marked as _AGLOBAL meaning all created facets are destroyed on a per-appdomain level.

The locale::facet *_Psave on the other hand is declared __PURE_APPDOMAIN_GLOBAL which seems to eventually expand to meaning per-process. So after the appdomain is cleaned up the per-process _Psave could potentially point to deleted faceted memory. This was precisely my problem. The way VS2010 Unit testing occurs is a process called QTAgent runs all your tests. Theses tests seem to be done in different appdomains on different runs by the same QTAgent process. Most likely to isolate side effects of previous test runs to effect subsequent tests. This is all well and good for fully managed code where pretty much all static storage is either thread/appdomain level but for C++/CLI which use per-process/per-appdomain incorrectly it could be a problem. The reason why I could never debug the test and ever find the problem is because the UT infrastructure always seem to spawn a new QTAgent process for debugging which means a new appdomain and a new process that has none of these problems.

Upvotes: 5

Thomas Matthews
Thomas Matthews

Reputation: 57678

I suggest trying this to see the actual date string:

cout << "Raw date: " << ss.str() << "\n";

Or stepping with a debugger and looking at the ss variable after it is created.

Upvotes: 0

Related Questions