user2961927
user2961927

Reputation: 1728

Easy way of managing the recycling of C++ STL vectors of POD types

My application consists of calling dozens of functions millions of times. In each of those functions, one or a few temporary std::vector containers of POD (plain old data) types are initialized, used, and then destructed. By profiling my code, I find the allocations and deallocations lead to a huge overhead.

A lazy solution is to rewrite all the functions as functors containing those temporary buffer containers as class members. However this would blow up the memory consumption as the functions are many and the buffer sizes are not trivial.

A better way is to analyze the code, gather all the buffers, premeditate how to maximally reuse them, and feed a minimal set of shared buffer containers to the functions as arguments. But this can be too much work.

I want to solve this problem once for all my future development during which temporary POD buffers become necessary, without having to have much premeditation. My idea is to implement a container port, and take the reference to it as an argument for every function that may need temporary buffers. Inside those functions, one should be able to fetch containers of any POD type from the port, and the port should also auto-recall the containers before the functions return.

// Port of vectors of POD types.
struct PODvectorPort 
{
  std::size_t Nlent; // Number of dispatched containers.
  std::vector<std::vector<std::size_t> > X; // Container pool.
  PODvectorPort() { Nlent = 0; }
};


// Functor that manages the port.
struct PODvectorPortOffice
{
  std::size_t initialNlent; // Number of already-dispatched containers
  // when the office is set up.
  PODvectorPort *p; // Pointer to the port.
  
  
  PODvectorPortOffice(PODvectorPort &port) 
  { 
    p = &port; 
    initialNlent = p->Nlent;
  }
  
  
  template<typename X, typename Y>
  std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
  {
    // return *((std::vector<X>*)(&y)); // UB although works
    std::vector<X> *rst = nullptr;
    std::memcpy(&rst, &y, std::min(
      sizeof(std::vector<X>*), sizeof(std::vector<Y>*)));
    return *rst; // guess it makes no difference. Should still be UB.
  }
  
  
  template<typename T>
  std::vector<T> & lend()
  {
    ++p->Nlent;
    // Ensure sufficient container pool size:
    while (p->X.size() < p->Nlent) p->X.push_back( std::vector<size_t>(0) );
    return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
  }

  
  void recall() { p->Nlent = initialNlent; }
  ~PODvectorPortOffice() { recall(); }
};


struct ArbitraryPODstruct
{
  char a[11]; short b[7]; int c[5]; float d[3]; double e[2];
};


// Example f1():
// f2(), f3(), ..., f50() are similarly defined.
// All functions are called a few million times in certain 
// order in main(). 
// port is defined in main().
void f1(other arguments..., PODvectorPort &port)
{
  
  PODvectorPort portOffice(port);
  
  // Oh, I need a buffer of chars:
  std::vector<char> &tmpchar = portOffice.lend();
  tmpchar.resize(789); // Trivial if container already has sufficient capacity.
  
  // ... do things
  
  // Oh, I need a buffer of shorts:
  std::vector<short> &tmpshort = portOffice.lend();
  tmpshort.resize(456);  // Trivial if container already has sufficient capacity.
  
  // ... do things.
  
  // Oh, I need a buffer of ArbitraryPODstruct:
  std::vector<ArbitraryPODstruct> &tmpArb = portOffice.lend();
  tmpArb.resize(123);  // Trivial if container already has sufficient capacity.
  
  // ... do things.
  
  // Oh, I need a buffer of integers, but also tmpArb is no longer 
  // needed. Why waste it? Cache hot.
  std::vector<int> &tmpint = portOffice.repaint(tmpArb);
  tmpint.resize(300); // Trivial.
  
  // ... do things.
}

Although the code is compliable by both gcc-8.3 and MSVS 2019 with -O2 to -Ofast, and passes extensive tests for all options, I expect criticism due to the hacky nature of PODvectorPortOffice::repaint(), which "casts" the vector type in-place.

A set of sufficient but not necessary conditions for the correctness and efficiency of the above code are:

  1. std::vector<T> stores 3 pointers to the underlying buffer's &[0], &[0] + .size(), &[0] + .capacity().
  2. std::vector<T>'s allocator calls malloc().
  3. malloc() returns an 8-byte (or sizeof(std::size_t)) aligned address.

So, if this is unacceptable to you, what would be the modern, proper way of addressing my need? Is there a way of writing a manager that achieve what my code does only without violating the Standard?

Thanks!

Edits: A little more context of my problem. Those functions mainly compute some simple statistics of the inputs. The inputs are data streams of financial parameters of different types and sizes. To compute the statistics, those data need to be altered and re-arranged first, thus the buffers for temporary copies. Computing the statistics is cheap, thus the allocations and deallocations can become expensive, relatively. Why do I want a manger for arbitrary POD type? Because 2 weeks from now I may start receiving a data stream of a different type, which can be a bunch of primitive types zipped in a struct, or a struct of the composite types encountered so far. I, of course, would like the upper stream to just send separate flows of primitive types, but I have no control of that aspect.

More edits: after tons of reading and code experimenting regarding the strict aliasing rule, the answer should be, don't try everything I put up there --- it works, for now, but don't do it. Instead, I'll be diligent and stick to my previous code-as-you-go style, just add a vector<vector<myNewType> > into the port once a new type comes up, and manage it in a similar way. The accepted answer also offers a nice alternative.

Even more edits: conceived a stronger class that has better chance to thwart potential optimizations under the strict aliasing rule. DO NOT USE IT WITHOUT TESTING AND THOROUGH UNDERSTANDING OF THE STRICT ALIASING RULE.

// -std=c++17
#include <cstring>
#include <cstddef>
#include <iostream>
#include <vector>
#include <chrono>


// POD: plain old data.
// Idea: design a class that can let you maximally reuse temporary 
// containers during a program.
// Port of vectors of POD types.
template <std::size_t portsize = 42>
class PODvectorPort 
{
  static constexpr std::size_t Xsize = portsize;
  std::size_t signature;
  std::size_t Nlent; // Number of dispatched containers.
  std::vector<std::size_t> X[portsize]; // Container pool.
  PODvectorPort(const PODvectorPort &);
  PODvectorPort & operator=( const PODvectorPort& );
  
  
public:
  std::size_t Ndispatched() { return Nlent; }
  std::size_t showSignature() { return signature; }
  PODvectorPort() // Permuted random number generator.
  { 
    std::size_t state = std::chrono::high_resolution_clock::now().time_since_epoch().count();
    state ^= (uint64_t)(&std::memmove);
    signature = ((state >> 18) ^ state) >> 27;
    std::size_t rot = state >> 59;
    signature = (signature >> rot) | (state << ((-rot) & 31));
    Nlent = 0; 
  }
  
  
  template<typename podvecport>
  friend class PODvectorPortOffice;
};


// Functor that manages the port.
template<typename podvecport>
class PODvectorPortOffice
{
  // Number of already-dispatched containers when the office is set up.
  std::size_t initialNlent; 
  podvecport *p; // Pointer to the port.
  
  
  PODvectorPortOffice( const PODvectorPortOffice& ); // non construction-copyable
  PODvectorPortOffice& operator=( const PODvectorPortOffice& ); // non copyable
  
  
  constexpr void check()
  {
    while (__cplusplus < 201703)
    {
      std::cerr << "PODvectorPortOffice: C++ < 17, Stall." << std::endl;
    }
    
    
    // Check if allocation will be 8-byte (or more) aligned.
    // Intend it not to work on machine < 64-bit.
    constexpr std::size_t aln = alignof(std::max_align_t);
    while (aln < 8)
    {
      std::cerr << "PODvectorPortOffice: Allocation is not at least 8-byte aligned, Stall." <<
        std::endl;
    }
    
    
    while ((aln & (aln - 1)) != 0)
    {
      std::cerr << "PODvectorPortOffice: Alignment is not a power of 2 bytes. Stall." << std::endl;
    }
    
    
    // Random checks to see if sizeof(vector<S>) != sizeof(vector<T>).
    if(true)
    {
      std::size_t vecHeadSize[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
      vecHeadSize[0] = sizeof(std::vector<char>(0));
      vecHeadSize[1] = sizeof(std::vector<short>(1));
      vecHeadSize[2] = sizeof(std::vector<int>(2));
      vecHeadSize[3] = sizeof(std::vector<long>(3));
      vecHeadSize[4] = sizeof(std::vector<std::size_t>(5));
      vecHeadSize[5] = sizeof(std::vector<float>(7));
      vecHeadSize[6] = sizeof(std::vector<double>(11));
      vecHeadSize[7] = sizeof(std::vector<std::vector<char> >(13));
      vecHeadSize[8] = sizeof(std::vector<std::vector<int> >(17));
      vecHeadSize[9] = sizeof(std::vector<std::vector<double> >(19));
      struct tmpclass1 { char a; short b; };
      struct tmpclass2 { char a; float b; };
      struct tmpclass3 { char a; double b; };
      struct tmpclass4 { int a; char b; };
      struct tmpclass5 { double a; char b; };
      struct tmpclass6 { double a[5]; char b[3]; short c[3]; };
      vecHeadSize[10] = sizeof(std::vector<tmpclass1>(23));
      vecHeadSize[11] = sizeof(std::vector<tmpclass2>(29));
      vecHeadSize[12] = sizeof(std::vector<tmpclass3>(31));
      vecHeadSize[13] = sizeof(std::vector<tmpclass4>(37));
      vecHeadSize[14] = sizeof(std::vector<tmpclass4>(41));
      vecHeadSize[15] = sizeof(std::vector<tmpclass4>(43));
      
      
      std::size_t notSame = 0;
      for(int i = 0; i < 16; ++i) 
        notSame += vecHeadSize[i] != sizeof(std::size_t) * 3;
      while (notSame)
      {
        std::cerr << "sizeof(std::vector<S>) != sizeof(std::vector<T>), \
PODvectorPortOffice cannot handle. Stall." << std::endl;
      }
    }
  }
  
  
  void recall() { p->Nlent = initialNlent; }
  
  
public:  
  PODvectorPortOffice(podvecport &port) 
  { 
    check();
    p = &port; 
    initialNlent = p->Nlent;
  }
  
  
  template<typename X, typename Y>
  std::vector<X> & repaint(std::vector<Y> &y) // Repaint the container.
    // AFTER A VECTOR IS REPAINTED, DO NOT USE THE OLD VECTOR AGAIN !!
  {
    while (std::is_same<bool, X>::value)
    {
      std::cerr << "PODvectorPortOffice: Cannot repaint the vector to \
std::vector<bool>. Stall." << std::endl;
    }
    std::vector<X> *x;
    std::vector<Y> *yp = &y;
    std::memcpy(&x, &yp, sizeof(x));
    return *x; // Not compliant with strict aliasing rule.
  }
  
  
  template<typename T>
  std::vector<T> & lend()
  {
    while (p->Nlent >= p->Xsize)
    {
      std::cerr << "PODvectorPortOffice: No more containers. Stall." << std::endl;
    }
    ++p->Nlent;
    return repaint<T, std::size_t>( p->X[p->Nlent - 1] );
  }
  
  
  ~PODvectorPortOffice() 
  { 
    // Because p->signature can only be known at runtime, an aggressive,
    // compliant compiler (ACC) will never remove this 
    // branch. Volatile might do, but trustworthiness?
    if(p->signature == 0)
    {
      constexpr std::size_t sizeofvec = sizeof(std::vector<std::size_t>);
      char dummy[sizeofvec * p->Xsize];    
      std::memcpy(dummy, p->X, p->Nlent * sizeofvec);
      std::size_t ticketNum = 0;
      char *xp = (char*)(p->X);
      for(int i = 0, iend = p->Nlent * sizeofvec; i < iend; ++i)
      {
        xp[i] &= xp[iend - i - 1] * 5;
        ticketNum += xp[i] ^ ticketNum;
      }
      std::cerr << "Congratulations! After the port office was decommissioned, \
you found a winning lottery ticket. The odds is less than 2.33e-10. Your \
ticket number is " << ticketNum << std::endl;
      std::memcpy(p->X, dummy, p->Nlent * sizeofvec);
      // According to the strict aliasing rule, a char* can point to any memory
  // block pointed by another pointer of any type T*. Thus given an ACC, 
  // the writes to that block via the char* must be fully acknowledged in 
  // time by T*, namely, for reading contents from T*, a reload instruction 
  // will be kept in the assembly code to achieve a sort of 
  // "register-cache-memory coherence" (RCMC).
  // We also do not care about the renters' (who received the reference via
  // .lend()) RCMC, because PODvectorPortOffice never accesses the contents
  // of those containers.
    }
    recall(); 
  }
};

Any adversarial test case to break it, especially on GCC>=8.3 or MSVS >= 2019, is welcomed!

Upvotes: 3

Views: 313

Answers (1)

Jon Reeves
Jon Reeves

Reputation: 2586

Let me frame this by saying I don't think there's an "authoritative" answer to this question. That said, you've provided enough constraints that a suggested path is at least worthwhile. Let's review the requirements:

  • Solution must use std::vector. This is in my opinion the most unfortunate requirement for reasons I won't get into here.
  • Solution must be standards compliant and not resort to rule violations, like the strict aliasing rule.
  • Solution must either reduce the number of allocations performed, or reduce the overhead of allocations to the point of being negligible.

In my opinion this is definitely a job for a custom allocator. There are a couple of off-the-shelf options that come close to doing what you want, for example the Boost Pool Allocators. The one you're most interested in is boost::pool_allocator. This allocator will create a singleton "pool" for each distinct object size (note: not object type), which grows as needed, but never shrinks until you explicitly purge it.

The main difference between this and your solution is that you'll have distinct pools of memory for objects of different sizes, which means it will use more memory than your posted solution, but in my opinion this is a reasonable trade-off. To be maximally efficient, you could simply start a batch of operations by creating vectors of each needed type with an appropriate size. All subsequent vector operations which use these allocators will do trivial O(1) allocations and deallocations. Roughly in pseudo-code:

// be careful with this, probably want [[nodiscard]], this is code
// is just rough guidance:
void force_pool_sizes(void)
{
    std::vector<int, boost::pool_allocator<int>> size_int_vect;
    std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> size_16_vect;
    ...
    
    size_int_vect.resize(100); // probably makes malloc calls
    size_16_vect.resize(200);  // probably makes malloc calls
    ...

    // on return, objects go out of scope, but singleton pools
    // with allocated blocks of memory remain for future use
    // until explicitly purged.
}

void expensive_long_running(void)
{
    force_pool_sizes();

    std::vector<int, boost::pool_allocator<int>> data1;
    ... do stuff, malloc/free will never be called...

    std::vector<SomePodSize16, boost::pool_allocator<SomePodSize16>> data2;
    ... do stuff, malloc/free will never be called...

    // free everything:
    boost::singleton_pool<boost::pool_allocator_tag, sizeof(int)>::release_memory();
}

If you want to take this a step further on being memory efficient, if you know for a fact that certain pool sizes are mutually exclusive, you could modify the boost pool_allocator to use a slightly different singleton backing store which allows you to move a memory block from one block size to another. This is probably out of scope for now, but the boost code itself is straightforward enough, if memory efficiency is critical, it's probably worthwhile.

It's worth pointing out that there's probably some confusion about the strict aliasing rule, especially when it comes to implementing your own memory allocators. There are lots and lots of SO questions about strict aliasing and what it does and doesn't mean. This one is a good place to start.

The key takeaway is that it's perfectly ordinary and acceptable in low level C++ code to take an array of memory and cast it to some object type. If this were not the case, std::allocator wouldn't exist. You also wouldn't have much use for things like std::aligned_storage. Look at the example use case for std::aligned_storage on cppreference. An STL-like static_vector class is created which keeps an array of aligned_storage objects that get recast to a concrete type. Nothing about this is "unacceptable" or "illegal", but it does require some additional knowledge and care in handling.

The reason your solution is especially going to enrage the code lawyers is that you're taking pointers of one non-char object type and casting them to different non-char object types. This is a particularly offensive violation of the strict aliasing rule, but also not really necessary given some of your other options.

Also keep in mind that it's not an error to alias memory, it's a warning. I'm not saying go crazy with aliasing, but I am saying that as with all things C and C++, there are justifiable cases to break rules, when you have very thorough knowledge and understanding of both your compiler and the machine you're running on. Just be prepared for some very long and painful debug sessions if it turns out you didn't in fact know those two things as well as you thought you did.

Upvotes: 2

Related Questions