Runcible
Runcible

Reputation: 7382

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.

void unleashMonkeyFish()  
{  
    MonkeyFish * monkey_fish = new MonkeyFish();
    std::string localname = "Wanda";  
    monkey_fish->setName(localname);  
    monkey_fish->go();  
}  

In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.

Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:

std::string name;
std::string & name;

When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:

void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );

I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?

What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?

Basically, I want to avoid these scenarios:

  1. I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
  2. I don't want to copy the string if I can help it.
  3. I would prefer not to new localname

What prototype and data member combination should I use?

CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)

EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.

Upvotes: 5

Views: 2642

Answers (9)

Frederik Slijkerman
Frederik Slijkerman

Reputation: 6529

In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:

class MonkeyFish {
public:
  void setName( const std::string & parameter_name ) { name = parameter_name; }

private:
  std::string name;
};

This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)

Upvotes: 2

Johannes Schaub - litb
Johannes Schaub - litb

Reputation: 507343

One way to do this is to have your string

std::string name;

As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed

void setName( const std::string & parameter_name ) {
    name = parameter_name;
}

It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).

The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.

Edit:

As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):

void setName(std::string parameter_name) {
    name.swap(parameter_name);
}

The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example

obj.setName("Mr. " + things.getName());

If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.

For further explanation, read the excellent article BoostCon09/Rvalue-References

Upvotes: 6

Gian Paolo
Gian Paolo

Reputation: 514

Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.

Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.

Can you clarify exactly why you don't want to copy the string?

Edit

An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.

... hope that made some sense :)

Upvotes: 3

lilburne
lilburne

Reputation: 553

As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.

I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.

Upvotes: 2

Thomas L Holaday
Thomas L Holaday

Reputation: 13862

When the compiler sees ...

std::string localname = "Wanda";  

... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.

You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.

Upvotes: 2

Crashworks
Crashworks

Reputation: 41482

This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.

Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.

Upvotes: 2

Greg Hewgill
Greg Hewgill

Reputation: 994599

If you use the following method declaration:

void setName( const std::string & parameter_name );

then you would also use the member declaration:

std::string name;

and the assignment in the setName body:

name = parameter_name;

You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).

Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.

Upvotes: 5

James Eichele
James Eichele

Reputation: 119184

If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.

As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.

Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.

Upvotes: 1

Andrew Flanagan
Andrew Flanagan

Reputation: 4277

You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).

I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.

Upvotes: 1

Related Questions