user8024280
user8024280

Reputation:

Prevent unnecessary copying between large structs

I have huge structs DataFrom and Data (which have different members in reality). Data is Created from DataFrom.

struct DataFrom{
    int a = 1;
    int b = 2;
};
static DataFrom dataFrom;   
struct Data{
    int a;
    int b;
};    
class DataHandler{
    public:
        static Data getData(const DataFrom& data2){
            Data data;
            setA(data, data2);
            setB(data, data2);
            return data;
        }
    private:
        static void setA(Data& dest, const DataFrom& source){
            dest.a = source.a;
        }
        static void setB(Data& dest, const DataFrom& source){
            dest.b = source.b;
        }
};
int main(){
    auto data = DataHandler2::getData(dataFrom); // copy of whole Data structure
    // ...
    return 0;
}

As Data is huge, in getData function, there is a copying of whole Data structure. Can this be prevented somehow in elegant way?

I had an idea about:

static void getData( Data& data, const DataFrom& data2);

But I would prefer to retrieve data as a return value, not an output parameter.

Upvotes: 2

Views: 761

Answers (2)

einpoklum
einpoklum

Reputation: 131646

There are two potential "copy hazards" to address here:

Copy hazard 1: The construction outside getData()

On the first line of main(), where you commented "copy of whole Data structure" - as commenters noted, the structure won't actually be copied, due to the Named Return Value Optimization, or NRVO for short. You can read about it in this nice blog post from a few years back:

Fluent{C++}: Return value optimizations

In a nutshell: The compiler arranges it so that data inside the getData function, when it is called from main(), is actually an alias of data in main.

Copy hazard 2: data and data2

The second "copy scare" is with setA() and setB(). Here you must be more pro-active, since you do have two live, valid structs in the same function - data and data2 within getData(). Indeed, if Data and DataFrom are simply large structs - then you will be doing a lot of copying from data2 to data, the way you wrote your code.

Move semantics to the rescue

If, however, your DataFrom holds a reference to some allocated storage, say, std::vector<int> a instead of int[10000] a - you could move from your DataFrom instead of copying from it - by having getData() with the signature static Data getData(DataFrom&& data2). Read more about moving here:

What is move semantics?

In my example, this would mean you would now use the raw buffer of data2.a for your data - without copying the contents of that buffer anywhere else. But that would mean you can no longer use data2 afterwards, since its a field has been cannibalized, moved from.

... or just be "lazy".

Instead of a move-based approach, you might try something else. Suppose you defined something like this:

class Data {
protected:
    DataFrom& source_;

public:  
    int& a() { return source_.a; }
    int& b() { return source_.b; }

public:
    Data(DataFrom& source) : source_(source) { }
    Data(Data& other) : source_(other.source) { }
    // copy constructor?
    // assignment operators? 
};    

Now Data is not a simple struct; it is more of a facade for a DataFrom (and perhaps some other fields and methods). That's a bit less convenient, but the benefit is that you now create a Data with merely a reference to a DataFrom and no copying of anything else. On access, you may need to dereference a pointer.


Other notes:

  • Your DataHandler is defined as a class, but it looks like it serves as just a namespace. You never instantiate "data handlers". Consider reading:

    Why and how should I use namespaces in C++?

  • My suggestions do not involve any C++17. Move semantics were introduced in C++11, and if you choose the "lazy" approach - that would work even in C++98.

Upvotes: 7

Bitwize
Bitwize

Reputation: 11220

Since you've tagged this with , you can write your code in a way that prevents any copying from taking place (or being able to take place), and if it compiles you will know that statically no copies will be made.

C++17 guarantees copy elision when returned from functions, which ensures no copies take place in certain circumstances. You can ensure this by changing your code so that Data has an = deleted copy-constructor, and changing getData to return a constructed object. If the code compiles correctly at all, you will be sure that no copy occurred (since copying would trigger a compile-error)

#include <iostream>

struct DataFrom{
    int a = 1;
    int b = 2;
};
static DataFrom dataFrom;   
struct Data{
    Data() = default;
    Data(const Data&) = delete; // No copy
    int a;
    int b;
};    
class DataHandler{
    public:
        static Data getData(const DataFrom& data2){
            // construct it during return
            return Data{data2.a, data2.b};
        }
    private:
        static void setA(Data& dest, const DataFrom& source){
            dest.a = source.a;
        }
        static void setB(Data& dest, const DataFrom& source){
            dest.b = source.b;
        }
};
int main(){
    auto data = DataHandler::getData(dataFrom); // copy of whole Data structure

    return 0;
}

This will compile without any extra copies -- you can see it here on compiler explorer

Upvotes: 2

Related Questions