Reputation: 4314
I'm developing a software where cache locality is king. It uses about 10 different C arrays in complex data structures that have #define MAX_ARRAY_1_SIZE 1000
like defines. The number of elements in this example 1 array can be then between 0-1000. There are arrays of structs within arrays of structs. The amount of memory used is about 163 megabytes.
However, I noticed C's qsort
is slow so I switched to C++
which has much faster std::sort
due to inlining the comparator function call. The wonders of template metaprogramming.
Now I'm wondering whether I should use C++'s more advanced features. I tried std::vector
in just two of the about 10 arrays, only to find out it kills my cache locality, leading to 15% performance reduction when only two arrays of the about 10 were replaced by std::vector
. Yes, I did find out it was due to cache locality: during the critical code path, the array size remains unchanged and it's only populated at program startup time.
Then I tried to see whether there is something that stores the elements in-line, not in a separate dynamically allocated block. I found std::array
. However, std::array
is a fixed size array and the benefit over C arrays is questionable.
I'm looking for a variable-sized array with maximum size and inline element storage. Is there such a thing in standard C++? Or in Boost?
It doesn't hurt if the variable-sized array actually could have infinite maximum size, so that it stores data in an in-line array if the data fits there, and resorts to using an allocated array if the data doesn't fit in the in-line array.
Now, an experienced C++ programmer could write such a template class in few hours, and I could do the same in few days due to using C++ last time about 10 years ago before I had to obtain the std::sort
performance increase now. But I don't want to reinvent the wheel, so I'm looking for existing solutions.
Edit:
To make myself clear, let me provide an example:
struct baz {
int a;
int b;
};
struct bar {
struct baz baz[100];
};
struct foo {
struct bar bar[100];
};
Now struct foo
is one contiguous block of memory.
struct baz {
int a;
int b;
};
struct bar {
std::vector<struct baz> baz;
};
struct foo {
std::vector<struct bar> bar;
};
...now it isn't.
Upvotes: 2
Views: 1000
Reputation: 4314
I decided to implement my own:
template<class C, class sz_t, sz_t maxsz> class inlinearray {
private:
typedef C value_type;
typedef value_type *pointer;
typedef const value_type *const_pointer;
typedef value_type &reference;
typedef const value_type &const_reference;
typedef value_type *iterator;
typedef const value_type *const_iterator;
typedef sz_t size_type;
typedef std::ptrdiff_t difference_type;
typedef std::reverse_iterator<iterator> reverse_iterator;
typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
sz_t sz;
union {
C realarray[maxsz]; // for correct alignment
char array[maxsz*sizeof(C)];
};
public:
inlinearray()
{
sz = 0;
}
~inlinearray(void)
{
clear();
}
void clear(void)
{
sz_t i;
for (i = 0; i < sz; i++)
{
data()[i].~C();
}
sz = 0;
}
template<class sz2_t, sz2_t maxsz2> inlinearray(inlinearray<C,sz2_t,maxsz2> that)
{
size_t i;
sz = that.sz;
for (i = 0; i < sz; i++)
{
push_back(that[i]);
}
}
template<class sz2_t, sz2_t maxsz2> void operator=(inlinearray<C,sz2_t, maxsz2> val2)
{
swap(val2);
}
void fill(const C& val)
{
std::fill_n(begin(), size(), val);
}
C &operator[](sz_t i) noexcept
{
return data()[i];
}
constexpr const C &operator[](sz_t i) const noexcept
{
return data()[i];
}
C at(sz_t i)
{
if (i >= sz)
{
throw std::out_of_range("inlinerray::at() out of range");
}
return data()[i];
}
constexpr const C at(sz_t i) const
{
if (i >= sz)
{
throw std::out_of_range("inlinerray::at() out of range");
}
return data()[i];
}
void push_back(const C &c)
{
if (sz >= maxsz)
{
abort();
}
new (data()+sz) C(c);
sz++;
}
void pop_back() noexcept
{
data()[sz-1].~C();
sz--;
}
template <class sz2_t, sz2_t maxsz2> void swap(inlinearray<C, sz2_t, maxsz2> &that)
{
if (that.sz > maxsz)
{
abort();
}
if (sz > that.maxsz)
{
abort();
}
std::swap_ranges(begin(), end(), that.begin());
std::swap(sz, that.sz);
}
constexpr sz_t size(void) const noexcept { return sz; }
constexpr sz_t max_size(void) const noexcept { return maxsz; }
constexpr bool empty() const noexcept { return sz == 0; }
C *begin() noexcept { return data(); }
C &front() noexcept { return data()[0]; }
C &back() noexcept { return sz == 0 ? data()[0] : data()[sz - 1]; }
constexpr const C &back() const noexcept { return sz == 0 ? data()[0] : data()[sz - 1]; }
C *end() noexcept { return data() + sz; }
C* data() noexcept { return reinterpret_cast<C*>(array); }
const C* data() const noexcept { return reinterpret_cast<const C*>(array); }
const C *begin() const noexcept { return data(); }
const C *end() const noexcept { return data() + sz; }
const C *cbegin() const noexcept { return data(); }
const C *cend() const noexcept { return data() + sz; }
reverse_iterator rbegin() noexcept { return reverse_iterator(end()); }
reverse_iterator rend() noexcept { return reverse_iterator(begin()); }
const_reverse_iterator rbegin() const noexcept { return const_reverse_iterator(end()); }
const_reverse_iterator rend() const noexcept { return const_reverse_iterator(begin()); }
const_reverse_iterator crbegin() const noexcept { return const_reverse_iterator(end()); }
const_reverse_iterator crend() const noexcept { return const_reverse_iterator(begin()); }
};
Usage of the code should be straightforward.
Upvotes: 0
Reputation: 275956
There are many small vector classes out there. Boost has some. Try http://www.boost.org/doc/libs/master/doc/html/container/non_standard_containers.html#container.non_standard_containers.small_vector
An easy hand rolled solition with only a bit of overhead would be a gsl span attached to a variant of vector and std array. There is overhead over the most optimal solution of a few bytes. Interact using the span over the unknown container.
Upvotes: 1