John Smith
John Smith

Reputation: 12807

How is vector implemented in C++

I am thinking of how I can implement std::vector from the ground up.

How does it resize the vector?

realloc only seems to work for plain old stucts, or am I wrong?

Upvotes: 58

Views: 42863

Answers (9)

Evan Teran
Evan Teran

Reputation: 90563

it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead, it uses the passed allocator (which by default is std::allocator).

Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.

UPDATE: in C++11, the elements will be moved instead of copy constructed if it is possible for the stored type.

In addition to this, it will need to store the current "size" and "capacity". Size is how many elements are actually in the vector. Capacity is how many could be in the vector.

So as a starting point a vector will need to look somewhat like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
    // public member functions
private:
    // NOTE: a "real" implementation would use the typedefs provided by the allocator. 
    // But I'm keeping it simple for clarity
    T*          data_;
    std::size_t capacity_;
    std::size_t size_;
    A           allocator_;
};

The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:

template <class T, class A = std::allocator<T> >
class vector {
public:
    // public member functions
private:
    // NOTE: a "real" implementation would use the typedefs provided by the allocator. 
    // But I'm keeping it simple for clarity
    T* data_;         // points to first element
    T* end_capacity_; // points to one past internal storage
    T* end_;          // points to one past last element
    A  allocator_;
};

I believe gcc's libstdc++ uses the latter approach, but both approaches are equally valid and conforming.

NOTE: This is ignoring a common optimization where the empty base class optimization is used for the allocator. I think that is a quality of implementation detail, and not a matter of correctness.

Upvotes: 61

Dattatraya Mengudale
Dattatraya Mengudale

Reputation: 336

    ///Implement Vector class
    class MyVector {
        int *int_arr;
        int capacity;
        int current;
    public:
        MyVector() {
            int_arr = new int[1];
            capacity = 1;
            current = 0;
        }
        void Push(int nData);
        void PushData(int nData, int index);
        void PopData();
        int  GetData(int index);
        int  GetSize();
        void Print();
    };

    void MyVector::Push(int data)
    {
        if (current == capacity){
            int *temp = new int[2 * capacity];
            for (int i = 0; i < capacity; i++)
            {
                temp[i] = int_arr[i];
            }

            delete[] int_arr;
            capacity *= 2;

            int_arr = temp;
        }
        int_arr[current] = data;
        current++;
    }
    void MyVector::PushData(int data, int index)
    {
        if (index == capacity){
            Push(index);
        }
        else
            int_arr[index] = data;
    }
    void MyVector::PopData(){
        current--;
    }

    int MyVector::GetData(int index)
    {
        if (index < current){
            return int_arr[index];
        }
    }

    int MyVector::GetSize()
    {
        return current;
    }

    void MyVector::Print()
    {
        for (int i = 0; i < current; i++) {
            cout << int_arr[i] << " ";
        }
        cout << endl;
    }
    
    int main()
    {
        MyVector vect;
        vect.Push(10);
        vect.Push(20);
        vect.Push(30);
        vect.Push(40);

        vect.Print();

        std::cout << "\nTop item is "
            << vect.GetData(3) << std::endl;

        vect.PopData();
        vect.Print();

        cout << "\nTop item is "
            << vect.GetData(1) << endl;
        return 0;
    }

Upvotes: 4

Manish Chandra Joshi
Manish Chandra Joshi

Reputation: 431

You can implement them with resizing array implementation. When the array becomes full, create an array with twice as much the size and copy all the content to the new array. Do not forget to delete the old array.

As for deleting the elements from vector, do resizing when your array becomes a quarter full. This strategy makes prevents any performance glitches when one might try repeated insertion and deletion at half the array size.

It can be mathematically proved that the amortized time (Average time) for insertions is still linear for n insertions which is asymptotically the same as you will get with a normal static array.

Upvotes: 0

log0
log0

Reputation: 10947

Like this: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h

(official gcc mirror on github)

Upvotes: 4

Serapth
Serapth

Reputation: 7172

From Wikipedia, as good an answer as any.

A typical vector implementation consists, internally, of a pointer to a dynamically allocated array,[2] and possibly data members holding the capacity and size of the vector. The size of the vector refers to the actual number of elements, while the capacity refers to the size of the internal array. When new elements are inserted, if the new size of the vector becomes larger than its capacity, reallocation occurs.[2][4] This typically causes the vector to allocate a new region of storage, move the previously held elements to the new region of storage, and free the old region. Because the addresses of the elements change during this process, any references or iterators to elements in the vector become invalidated.[5] Using an invalidated reference causes undefined behaviour

Upvotes: 6

Owen S.
Owen S.

Reputation: 7855

You'd need to define what you mean by "plain old structs."

realloc by itself only creates a block of uninitialized memory. It does no object allocation. For C structs, this suffices, but for C++ it does not.

That's not to say you couldn't use realloc. But if you were to use it (note you wouldn't be reimplementing std::vector exactly in this case!), you'd need to:

  1. Make sure you're consistently using malloc/realloc/free throughout your class.
  2. Use "placement new" to initialize objects in your memory chunk.
  3. Explicitly call destructors to clean up objects before freeing your memory chunk.

This is actually pretty close to what vector does in my implementation (GCC/glib), except it uses the C++ low-level routines ::operator new and ::operator delete to do the raw memory management instead of malloc and free, rewrites the realloc routine using these primitives, and delegates all of this behavior to an allocator object that can be replaced with a custom implementation.

Since vector is a template, you actually should have its source to look at if you want a reference – if you can get past the preponderance of underscores, it shouldn't be too hard to read. If you're on a Unix box using GCC, try looking for /usr/include/c++/version/vector or thereabouts.

Upvotes: 2

Jerry Coffin
Jerry Coffin

Reputation: 490768

Resizing the vector requires allocating a new chunk of space, and copying the existing data to the new space (thus, the requirement that items placed into a vector can be copied).

Note that it does not use new [] either -- it uses the allocator that's passed, but that's required to allocate raw memory, not an array of objects like new [] does. You then need to use placement new to construct objects in place. [Edit: well, you could technically use new char[size], and use that as raw memory, but I can't quite imagine anybody writing an allocator like that.]

When the current allocation is exhausted and a new block of memory needs to be allocated, the size must be increased by a constant factor compared to the old size to meet the requirement for amortized constant complexity for push_back. Though many web sites (and such) call this doubling the size, a factor around 1.5 to 1.6 usually works better. In particular, this generally improves chances of re-using freed blocks for future allocations.

Upvotes: 5

Edward Strange
Edward Strange

Reputation: 40895

realloc only works on heap memory. In C++ you usually want to use the free store.

Upvotes: -6

Macke
Macke

Reputation: 25710

It allocates a new array and copies everything over. So, expanding it is quite inefficient if you have to do it often. Use reserve() if you have to use push_back().

Upvotes: 3

Related Questions