Scott McPeak
Scott McPeak

Reputation: 12864

Is it undefined behavior to pass a pointer to an unconstructed streambuf object to the ostream constructor?

Question

Does the following program have undefined behavior?

#include <iostream>          // std::{ostream, streambuf}

// The streambuf ctor is protected so we need a wrapper to create one.
struct mystreambuf : public std::streambuf {};

extern mystreambuf sb;       // Not yet constructed.
std::ostream os(&sb);        // Passing "invalid" pointer here?  UB?
mystreambuf sb;              // Now it is constructed.

int main() { return 0; }

It invokes the ostream constructor, passing a pointer to a streambuf object whose lifetime has not yet begun (basic.life p1). Does this constitute undefined behavior?

Attempted answer

If streambuf were a user-written class, then class.cdtor p1 would govern, which says:

For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior. [...]

This language, and its accompanying example, make it clear that merely taking the address of an unconstructed object is not undefined. As far I can tell, passing that address as a pointer to a user-written function that only stores its value and tests it against nullptr is also not undefined.

But streambuf is a library class, so instead res.on.arguments p1 applies, which says, in part:

If an argument to a function has an invalid value (such as a value outside the domain of the function or a pointer invalid for its intended use), the behavior is undefined.

But what constitutes an "invalid value"? Presumably we have to determine the "intended use" by reading the specification of the called function. The constructor spec ostream.cons p1 says in part:

Effects: Initializes the base class subobject with basic_ios<charT, traits>::init(sb) ([basic.ios.cons]).

The spec for init basic.ios.cons p4 says:

Postconditions: The postconditions of this function are indicated in Table 127.

where Table 127 has two rows that mention sb:

Element    Value
-------    -----
rdbuf()    sb
rdstate()  goodbit if sb is not a null pointer, otherwise badbit.

So, at first glance, this would seem to suggest that sb is only stored (so that rdbuf() can return it) and tested for being nullptr; and that these together comprise its "intended use". Since both of these would be legal for user-written code to do, it is legal to pass the pointer in question, so the program has defined behavior.

But Table 127 is merely a list of postconditions. It does not definitively assert that nothing else is in the scope of "intended use". For that, it would seem necessary to exhaustively review everything that basic_ostream and its subclasses potentially do with sb.

While attempting to do so, I find imbue at basic.ios.members p9:

Effects: Calls ios_base::imbue(loc) and if rdbuf() != 0 then rdbuf()->pubimbue(loc).

Clearly, calling rdbuf()->pubimbue(loc) before the object pointed to by rdbuf() is constructed is undefined. Do we call imbue? Not explicitly of course, and there's no particular reason to suspect an indirect call either, but the existence of this behavior arguably puts it in scope of the "intended use" of the pointer passed to the constructor, since eventually it could be used this way. Furthermore, would it necessarily be non-conforming for an implementation to call imbue on its own during the ostream constructor? I don't see why it would be, and if an implementation is free to call imbue in the constructor, then clearly we have undefined behavior. And there could be other methods that suggest other usages, as my survey was by no means complete.

Now, in a comment on an answer to a related question, indi observes that the Clang implementation of std::basic_fstream does pass a pointer to an unconstructed member object to the iostream constructor at fstream:1419:

  basic_filebuf<char_type, traits_type> __sb_;
};

template <class _CharT, class _Traits>
inline basic_fstream<_CharT, _Traits>::basic_fstream() : basic_iostream<char_type, traits_type>(&__sb_) {}

But this example is not definitive because (1) it could be a mistake, and (2) the library implementation is generally allowed to do things that would be undefined in user code. Nevertheless, it is at least weak evidence that the Clang developers think the practice does not have undefined behavior, as they have no reason in this case to write code that relies on the library's license to bend the rules, since it would be a trivial change to instead pass nullptr to the constructor and then in the body call init with the address of the (now fully constructed) member object.

Ultimately, it seems to me that the language specification is ambiguous, as it relies on the terms "invalid value" and "intended use" which are not clearly specified. But perhaps someone can identify a provision I have missed or an error in my interpretations.

Related questions

While researching this, I came across some existing questions that seemed related. The question How to inherit from std::ostream? has three relevant answers:

From these answers and comments, I infer that quite a few knowledgeable people believe that the example at the top of this question has undefined behavior.

Meanwhile, the question Is it dangerous to pass a pointer to a subobject that is not constructed yet to a constructor of another subobject during the object construction? is very nearly the same as mine, but is marred by having some important parts of the example code missing, and involves an extraneous AnotherClass that further muddies the question. The answer by aschepler seems to say that the practice is ok in general, but not in the OP's case because of AnotherClass, but it only reasons as if all of the code were written by the user, ignoring the library aspect.

Finally, the question Is it safe to pass an unconstructed buffer to the constructor of std::ostream? is essentially the same as mine--I'm asking a duplicate! Why? In short, that question has no answers, and I think the additional research in my question makes it more likely mine can be answered, so I'm effectively submitting this with the intention of replacing that one. I asked a meta question about whether asking this duplicate is acceptable, and the consensus seems to be that is.


I've accepted Chris Dodd's answer, but I want to elaborate a little on it, so this is a restatement of that answer in my own words.

The original example has undefined behavior because, in this line:

std::ostream os(&sb);        // Passing "invalid" pointer here?  UB?

the expression &sb has type mystreambuf*, but is being passed to a constructor that accepts std::streambuf*, and therefore must undergo derived-to-base conversion. That conversion, applied to a pointer to an unconstructed object with non-trivial constructor, has undefined behavior since it is a "[reference] to any [...] base class of the object", which is prohibited by class.cdtor p1.

The example in that section further clarifies. Quoting the key lines from it:

struct X { int i; };
struct Y : X { Y(); };                  // non-trivial
struct A { int a; };
struct B : public A { int j; Y y; };    // non-trivial

extern B bobj;
A* pa = &bobj;                          // undefined behavior: upcast to a base class type
B bobj;                                 // definition of bobj

Moreover, this means that not only is the specific example in the question undefined, but it is in general undefined to do what the question title says, namely to "pass a pointer to an unconstructed streambuf object to the ostream constructor". That is because the std::streambuf constructor is protected, so an instance must always be a proper base class subobject, and therefore the only way to obtain a std::streambuf* is with a derived-to-base conversion.

That implies that the code quoted from the Clang libc++ would have undefined behavior if it were user code, and I have filed Issue #93307 against Clang about that.

Upvotes: 9

Views: 314

Answers (2)

indi
indi

Reputation: 121

So, @JerryCoffin’s answer is correct, but there is an objection to it on the grounds that while the standard clearly specifies what basic_ios::init() does, it doesn’t specify what it doesn’t do. So (the objection goes), while the standard asserts that the only things basic_ios::init() does with the passed pointer are compare it to nullptr and store it… it might also dereference it, which would trigger UB in the situation described.

Okay, let’s assume that logic makes sense.

So, because basic_ios::init() “might” dereference the pointer, and because the basic_ostream constructor calls basic_ios::init(), we can’t pass a pointer to a member. So we can’t do this:

class myostream :
    public std::ostream
{
    std::streambuf _buf;

public:
    myostream() : std::ostream{&_buf} {}

    // other stuff...
};

Because although the standard specifies that the postconditions of the ostream conductor (indirectly/transitively) just compare the pointer passed to nullptr and keep a copy… the postconditions are not necessary exhaustive. So it might dereference the pointer for some unknown reasons.

If so, that would be UB. So how would we avoid that?

The solution offered looks like this:

class myostream :
    private std::streambuf,
    public std::ostream
{
public:
    myostream() : std::ostream{this} {}

    // other stuff...
};

So, great! Problem solved, right?

Well, no.

Because, you see, the standard doesn’t say that the ostream constructor or basic_ios::init() don’t delete the pointer passed.

basic_ios::init() might do this:

auto basic_ios::init(streambuf* p_buf)
{
    // do all the stuff init() is specified to do, and then...

    delete p_buf;
}

Why not? The postconditions don’t say explicitly that the stream buffer pointed to by the argument won’t be deleted. And that doesn’t contradict the postconditions.

Or maybe it does this:

auto basic_ios::init(streambuf* p_buf)
{
    // do all the stuff init() is specified to do, and then...

    p_buf->~streambuf();
    
    ::new (static_cast<void*>(p_buf)) streambuf{};
}

Again, why not? That wouldn’t literally contradict the precise wording of the contract of basic_ios::init() as spelled out in the standard. So it could happen, right?

If you suppose that basic_ios::init() is free to do anything with the pointer that it doesn’t explicitly say it won’t, then your clever inheritance strategy won’t work either. In fact… literally nothing will work. If basic_ios::init() is allowed to do LITERALLY ANYTHING with the pointer you pass it—so long as it doesn’t contradict the explicit wording of the contract—then you can’t assume anything about the stream buffer pointer you pass to it. You can’t assume it won’t be destroyed. You can’t assume it will be destroyed. You can’t assume it won’t be overwritten.

So basically, basic_ios::init() is just impossible to use safely. Which means it is impossible to create our own output streams, because we must call basic_ios::init(), directly or indirectly, at some point (before the destructor, or any member functions).

So, there’s your conclusion. It is just impossible to create your own custom streams or stream buffers, because the standard writers didn’t explicitly rule out every asininely imaginable possible contingency for what might happen with that pointer.

Or… maybe… our logic went off the rails somewhere.

Look, the people writing the standard are not doing it for the sake of a group of D&D players who get off on picking apart the micro-semantics of every single rule clause looking for a way to game the system. The committee has neither the time nor the patience to cater to every absurd rule-twisting fanatic’s desire to find loopholes. They will include as much explicit detail as is necessary for reasonable implementers to produce implementations that behave consistently with each other, and with the understanding that reasonable readers of the standard will interpret from it.

So let’s approach this like reasonable people.

The standard specifies what basic_ios::init() does with the pointer passed. It says nothing about the pointer being dereferenced, not even a non-normative note suggesting that might be the case.

Yes, it does not explicitly state that the pointer won’t be dereferenced (or deleted, or anything else). But consider this: As I pointed out in another comment, Clang’s libc++ does basically what the first code block above does. If there were a reasonable interpretation of the contract of basic_ios::init() that implied the pointer might be dereferenced… wouldn’t somebody have noticed the problem in the decade or so that libc++ has been in widespread use? Don’t you think that, maybe, a sanitizer or two might have noticed?

And, out of curiosity, I also checked the Microsoft standard library source code. Yup, it does the same thing: passes a pointer to a stream buffer data member. That’s two major, widely-used standard libraries. I don’t know how long that particular standard library has been in use, but again… don’t you think somebody would have raised the issue by now if it were a reasonable interpretation of the standard that that stream buffer pointer might be de-referenced before the stream buffer is constructed?

(And I can’t dig up my copy of Langer & Kreft right now, but I’m pretty sure they do the same thing, too.)

Once again: be reasonable. IOstreams has been in the standard since 1998, and it was a widely used library even before that, going back as far as 1984. The wording has been pored over, revised, and studied in dozens and dozens of defect reports. If “it doesn’t say it doesn’t dereference” were a reasonable interpretation of the standard’s definition of basic_ios::init()… don’t you think someone would have done something about that sometime in the last ~30–40 years? Don’t you think someone working on or with the Microsoft standard library OR Clang’s standard library—or one of the many, many people who have made their own custom streams (including the people making new standard custom streams, like in networking proposals)—would have pointed out the issue?

Be reasonable. The standard doesn’t have to explicitly say the pointer won’t be dereferenced, because that would be a pants-crappingly stupid thing to do to a pointer that you haven’t specified must point to a valid stream buffer. Everything else in basic_ios follows that reasoning: the destructor also doesn’t delete the pointer. Indeed, if basic_ios::init() were allowed to dereference the pointer, that would wildly complicate the process of making a custom stream. And for what? For what gain? Why would the IOStreams library be better if it did allow for basic_ios::init() to dereference the stream pointer? How would that compare to the many ways it would be massively worse if you couldn’t assume it was safe to pass a pointer to a member stream buffer?

Conclusion: The fact that the standard wording doesn’t explicitly state… that the things it explicitly states it does with the stream buffer pointer are the only things it does with it… does not imply it may do any random thing with the stream buffer pointer. Especially things that might create UB if they were done unexpectedly. If it required a pointer to a valid stream buffer, it would say so. It does not, and instead lists a bunch of things that don’t require a valid stream buffer.

Suggestion: Don’t treat the standard like a riddle and pick through its wording looking for traps.

CLEARLY the intention is for basic_ios::init() to just compare the pointer to nullptr and keep a copy. It makes no damn sense to not have that be the implication and instead require stream implementers to resort to gymnastics like multiple inheritance (or dynamic allocation followed by rdbuf() to retrieve the pointer later, or other wacky, circuitous ideas). I mean… why? Why would you design the library like that? That would be absurd. Why would you so unnecessarily hamstring the obvious and safest way to implement a stream with an underlying stream buffer?

tl;dr: 1) @JerryCoffin is correct that the behaviour is defined, by reasonable implication from the standard wording. 2) The first code block is fine, and you can pass a pointer to an uninitialized stream buffer to basic_ios::init(). 3) Two major standard libraries work that way, and have done so for decades without any concern raised. 4) There are no rhetorical traps in the C++ standard.

Upvotes: 2

Chris Dodd
Chris Dodd

Reputation: 126418

The language you quoted

For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior. [...]

would seem to indicate this is undefined behavior -- you're referring to the base class (std::streambuf) of an object before the constructor has run. What happens in the ostream constructor is irrelevant.

Upvotes: 3

Related Questions