jules
jules

Reputation: 63

C++ Shallow and deep copying - reflecting changes in the num_items of a vector

I'm currently undertaking a C++ course at university. I understand the general concept of shallow and deep copying using vectors however there's an example in my textbook that has me confused.

Please assume that it is a poorly implemented vector with no copy constructor defined so that it only performs a shallow copy of the data.

I understand what's happening in the first part

In the statement

vector<int> v2(v1);

vector v1 is passed as a const reference argument to the vector copy constructor, so v1 can’t be changed, and the variable v2 is then initialized to a copy of the vector v1. Each data field will be copied, and any changes made later to v2 should not affect v1. When the value in v1.the_data is copied over, both v1.the_data and v2.the_data will point to the same array

Because v1.the_data and v2.the_data point to the same object, the statement

v1[2] = 10;

also changes v2[2]. For this reason, v2 is considered a shallow copy of v1.

However I'm struggling to understand this part. I'm not quite sure why v2.num_items won't also change in a shallow copy.

The statement

v1.push_back(20);

will insert 20 into v1[5] and will change v1.num_items to 6, but will not change v2.num_items.

My current thoughts on it are that v1.the_data and v2.the_data are pointing to the same place in memory therefore they 'share' the same vector so that when 20 is added to the end of it both of the vectors should gain an additional integer.

I would greatly appreciate assistance in understanding why the number of items won't change for v2 when v1 is modified.

Upvotes: 2

Views: 1553

Answers (4)

James Kanze
James Kanze

Reputation: 153977

The statement seems to assume a particular implementation of vector (which is not conform with std::vector). Suppose, for example, we have a very naïve implementation:

template <typename T>
class Vector
{
    T* myData;
    int mySize;
    int myCapacity;
public:
    void push_back( T const& newValue )
    {
        if ( mySize == myCapacity ) {
            //  Extend the capacity...
        }
        myData[mySize] = newValue;
        ++ mySize;
    }
    T& operator[]( int index )
    {
        return myData[index];
    }
};

If you don't have a copy constructor, when you copy the vector, all three variables will end up the same: both vectors will have a pointer to the same data, the same size and the same capacity. But these are copies: when you use [], you modify the memory pointed to by myData, which is the same in both vectors; when you do the push_back on v1, you update the size of v1, in its local copy of the size.

Of course, this implementation is naïve in a lot of ways. A good implementation of something like std::vector requires a fair amount of thought, not just because if requires deep copy semantics, but also for reasons of exception safety (the constructor of T might throw), and to avoid imposing unnecessary requirements (in particular, a default constructor).

Also, if I were trying to use a poorly implemented vector as an example of shallow copy, I wouldn't call it vector, since that immediately conjures up the image of std::vector, which shouldn't be poorly implemented (and isn't in the library implementations I know).

Upvotes: 2

quantdev
quantdev

Reputation: 23813

Assuming we are talking about the standard std::vector :

When you copy the vector in this statement :

vector<int> v2(v1);

v2 is built by copying each element of v1. v1 and v2 do not share any of their memory.

This part :

both v1.the_data and v2.the_data will point to the same array

Because v1.the_data and v2.the_data point to the same object,

Is wrong.

You can convince yourself by comparing the underlying arrays addresses of each of your vectors with the data() member function.

EDIT :

Assuming you are crazy enough to not use std::vector and use an implementation that would "share" its back end array when copied (I wont talk about the issues with this design : who owns the array ? who delete[] it ?)

The issue raised by your teacher is that when v1 is modified (e.g. an element is added), v2 does not know about it, and has an unchanged size.

Any push_back (or the likes) made to one vector should be observed by every other owner of the array, to properly reflect the size of the array.

Either :

1) you implement some kind of observer pattern to have each vector aware of any modification (and it is more difficult than it sounds)

2) you use tricks to store the length in the backend array itself.

You would run into similar issues to invalid every iterators when the "shared" array is modified through one of the vectors references... A nightmare ! There are good reasons why the STL containers were all designed to managed their own memory, hence always providing deep copy semantics.

Upvotes: 2

Blastfurnace
Blastfurnace

Reputation: 18652

Is your textbook talking about std::vector from the standard library? If so, it is wrong. vector<int> v2(v1); copy constructs v2 from v1. This is a deep copy, the two containers don't share storage and are completely separate.

If, instead, this is a badly implemented vector class and the containers share storage then changing an existing element in one will be reflected in the other. An operation like push_back that changed one container's num_items but not the other's would cause them to disagree on their size.

Upvotes: 2

stefaanv
stefaanv

Reputation: 14392

The problem in understanding the statement in this question is whether when we have to consider vector as std::vector or as a theoretical implementation. std::vector doesn't allow shallow copying and the reason is given in the statement: the invariants can't be respected because of this.

Now take the theoretical implementation with "the_data" and "num_items" members. Here copying the vector should give a deep copy, but just copying "the_data" gives a shallow copy because only a pointer is copied. This gives several issues: adapting the actual data in one vector will result in an inconsistent state in the other and memory management can't be done anymore.

Upvotes: 0

Related Questions