AlgoRythm
AlgoRythm

Reputation: 1389

Why is this vector implementation more performant?

For learning purposes, I decided to implement my own vector data structure. I called it list because that seems to generally be the more proper name for it but that's unimportant.

I am halfway through implementing this class (inserting and getting are complete) and I decide to write some benchmarks with surprising results.

My compiler is whatever Visual Studio 2019 uses. I have tried debug and release, in x64 and x86.

For some reason, my implementation is faster than vector and I cannot think of a reason why. I fear that either my implementation or testing method are flawed.

Here are my results (x64, debug):

List: 13269ms
Vector: 78515ms

Release has a much less drastic, but still apparent, difference.

List: 65ms
Vector: 247ms

Here is my code

dataset.hpp:

#ifndef DATASET_H
#define DATASET_H

#include <memory>
#include <stdexcept>
#include <algorithm>
#include <functional>
#include <chrono>

namespace Dataset {
    template <class T>
    class List {
    public:
        List();
        List(unsigned int);
        void push(T);
        T& get(int);
        void reserve(int);
        void shrink();
        int count();
        int capacity();
        ~List();
    private:
        void checkCapacity(int);
        void setCapacity(int);
        char* buffer;
        int mCount, mCapacity;
    };


    template <class T>
    List<T>::List() {
        mCount = 0;
        mCapacity = 0;
        buffer = 0;
        setCapacity(64);
    }
    template <class T>
    List<T>::List(unsigned int initcap) {
        mCount = 0;
        buffer = 0;
        setCapacity(initcap);
    }
    template <class T>
    void List<T>::push(T item) {
        checkCapacity(1);
        new(buffer + (sizeof(T) * mCount++)) T(item);
    }
    template <class T>
    T& List<T>::get(int index) {
        return *((T*)(buffer + (sizeof(T) * index)));
    }
    template <class T>
    void List<T>::reserve(int desired) {
        if (desired > mCapacity) {
            setCapacity(desired);
        }
    }
    template <class T>
    void List<T>::shrink() {
        if (mCapacity > mCount) {
            setCapacity(mCount);
        }
    }
    template <class T>
    int List<T>::count() {
        return mCount;
    }
    template <class T>
    int List<T>::capacity() {
        return mCapacity;
    }
    template <class T>
    void List<T>::checkCapacity(int cap) {
        // Can <cap> more items fit in the list? If not, expand!
        if (mCount + cap > mCapacity) {
            setCapacity((int)((float)mCapacity * 1.5));
        }
    }
    template <class T>
    void List<T>::setCapacity(int cap) {
        mCapacity = cap;
        // Does buffer exist yet?
        if (!buffer) {
            // Allocate a new buffer
            buffer = new char[sizeof(T) * cap];
        }
        else {
            // Reallocate the old buffer
            char* newBuffer = new char[sizeof(T) * cap];
            if (newBuffer) {
                std::copy(buffer, buffer + (sizeof(T) * mCount), newBuffer);
                delete[] buffer;
                buffer = newBuffer;
            }
            else {
                throw std::runtime_error("Allocation failed");
            }
        }
    }
    template <class T>
    List<T>::~List() {
        for (int i = 0; i < mCount; i++) {
            get(i).~T();
        }
        delete[] buffer;
    }

    long benchmark(std::function<void()>);
    long benchmark(std::function<void()>, long);

    long benchmark(std::function<void()> f) {
        return benchmark(f, 100000);
    }

    long benchmark(std::function<void()> f, long iters) {
        using std::chrono::high_resolution_clock;
        using std::chrono::duration_cast;
        auto start = high_resolution_clock::now();
        for (long i = 0; i < iters; i++) {
            f();
        }
        auto end = high_resolution_clock::now();
        auto time = duration_cast<std::chrono::milliseconds>(end - start);
        return (long)time.count();
    }

}


#endif

test.cpp:

#include "dataset.hpp"
#include <iostream>
#include <vector>


/*

    TEST CODE

*/


class SimpleClass {
public:
    SimpleClass();
    SimpleClass(int);
    SimpleClass(const SimpleClass&);
    void sayHello();
    ~SimpleClass();
private:
    int data;
};

SimpleClass::SimpleClass() {
    //std::cout << "Constructed " << this << std::endl;
    data = 0;
}

SimpleClass::SimpleClass(int data) {
    //std::cout << "Constructed " << this << std::endl;
    this->data = data;
}

SimpleClass::SimpleClass(const SimpleClass& other) {
    //std::cout << "Copied to " << this << std::endl;
    data = other.data;
}

SimpleClass::~SimpleClass() {
    //std::cout << "Deconstructed " << this << std::endl;
}

void SimpleClass::sayHello() {
    std::cout << "Hello! I am #" << data << std::endl;
}

int main() {

    long list = Dataset::benchmark([]() {
        Dataset::List<SimpleClass> list = Dataset::List<SimpleClass>(1000);
        for (int i = 0; i < 1000; i++) {
            list.push(SimpleClass(i));
        }
    });

    long vec = Dataset::benchmark([]() {
        std::vector<SimpleClass> list = std::vector<SimpleClass>(1000);
        for (int i = 0; i < 1000; i++) {
            list.emplace_back(SimpleClass(i));
        }
    });

    std::cout << "List: " << list << "ms" << std::endl;
    std::cout << "Vector: " << vec << "ms" << std::endl;

    return 0;
}

Upvotes: 0

Views: 247

Answers (2)

Davislor
Davislor

Reputation: 15154

This is a really nice start! Clean and simple solution to the exercise. Sadly, your instincts are right that you weren’t testing enough cases.

One thing that jumps out at me is that you never resize your vectors, and therefore don’t measure how most STL implementations can often avoid copying when they grow in size. It also never returns any memory to the heap when it shrinks. You also don’t say whether you were compiling with /Oz to enable optimizations. But my guess is that there’s a small amount of overhead in Microsoft’s implementation, and it would pay off in other tests (especially an array of non-trivially-copyable data that needs to be resized, or a series of vectors that start out big but can be filtered and shrunk, or storing lots of data that can be moved instead of copied).

One bug that jumps out at me is that you call new[] to allocate a buffer of char—which is not guaranteed to meet the alignment requirements of T. On some CPUs, that can crash the program.

Another is that you use std::copy with an uninitialized area of memory as the destination in List::setCapacity. That doesn’t work except in special cases: std::copy expects a validly-initialized object that can be assigned to. For any type where assignment is a non-trivial operation, this will fail when the program tries to call a destructor on garbage data. If that happens to work, the move will then inefficiently clone the data and destroy the original, rather than using the move constructor if one exists. The STL algorithm you really want here is std::uninitialized_move. You might also want to use calloc/realloc, which allows resizing blocks.

Your capacity and size members should be size_t rather than int. This not only limits the size to less memory than most implementations can address, calculating a size greater than INT_MAX (i.e., 2 GiB or more on most implementations) causes undefined behavior.

One thing List::push has going for it is that it uses the semantics of std::vector::emplace_back (which you realize, and use as your comparison). It could, however, be improved. You pass item in by value, rather than by const reference. This creates an unnecessary copy of the data. Fortunately, if T has a move constructor, the extra copy can be moved, and if item is an xvalue, the compiler might be able to optimize the copy away, but it would be better to have List::push(const T&) and List::push(T&&). This will let the class push an xvalue without making any copies at all.

List::get is better, and avoids making copies, but it does not have a const version, so a const List<T> cannot do anything. It also does not check bounds.

Consider putting the code to look up the position of an index within the buffer into a private inline member function, which would drastically cut down the amount of work you will need to do to fix design changes (such as the ones you will need to fix the data-alignment bug).

Upvotes: 1

Slava
Slava

Reputation: 44268

std::vector constructor with one parameter creates vector with count elements:

explicit vector( size_type count, const Allocator& alloc = Allocator() );

To have something comparable for vector you have to do:

std::vector<SimpleClass> list;
list.reserve( 1000 );

also your "vector" copies objects it holds by simply copying memory, which is only allowed for trivially copyable objects, and SimpleClass is not one of them as it has user defined constuctors.

Upvotes: 2

Related Questions