gdiquest
gdiquest

Reputation: 125

How to handle allocation/deallocation for small objects of variable size in C++

I am currently writing C++ code to store and retrieve tabular data (e.g. a spreadsheet) in memory. The data is loaded from a database. The user can work with the data and there is also a GUI class which should render the tabular data. The GUI renders only a few rows at once, but the tabular data could contain 100,000s of rows at once.

My classes look like this:

With this design a table with 40 columns and 200k rows contains over 8 million objects. After some experiments I saw that allocating and deallocating 8 million objects is a very time consuming task. Some research showed that other people are using custom allocators (like Boosts pool_allocator) to solve that problem. The problem is that I can't use them in my problem domain, since their performance boost comes from relying on the fact, that all objects allocated have the same size. This is not the case in my code, since my objects differ in size.

Are there any other techniques I could use for memory management? Or do you have suggestions about the design?

Any help would be greatly appreciated!

Cheers, gdiquest

Edit: In the meantime I found out what my problem was. I started my program under Visual Studio, which means that the debugger was attached to the debug- and also to the release-build. With an attached debugger my executable uses a so called debug heap, which is very slow. (further details here) When I start my program without a debugger attached, everything is as fast as I would have expected it.

Thank you all for participating in this question!

Upvotes: 3

Views: 837

Answers (2)

Wes
Wes

Reputation: 296

Why not just allocate 40 large blocks of memory? One for each column. Most of the columns will have fixed length data which makes those easy and fast. eg vector<int> col1(200000). For the variable length ones just use vector<string> col5(200000). The Small String Optimization will ensure that your short strings require no extra allocation. Only rows with longer strings (generally > 15 characters) will require allocs.

If your variable length columns are not storing strings then you could also use vector<vector<unsigned char>> This also allows a nice pre-allocation strategy. eg Assume your biggest variable length field in this column is 100 bytes you could do:

    vector<vector<unsigned char>> col2(200000);
    for (auto& cell : col2)
    {
         cell.resize(100);
    }

Now you have a preallocated column that supports 200000 rows with a max data length of 100 bytes. I would definitely go with the std::string version though if you can as it is conceptually simpler.

Upvotes: 2

ArtemGr
ArtemGr

Reputation: 12547

Try rapidjson allocators, they are not limited to objects of the same size AFAIK.

You might attach an allocator to a table and allocate all table objects with it.
For more granularity, you might have row or column pools.

Apache does this, attaching all data to request and connection pools.

If you want them to be STL-compatible then perhaps this answer will help to integrate them, although I'm not sure. (I plan to try something like this myself, but haven't gotten to it yet).

Also, some allocators might be faster than what your system offers by default. TCMalloc, for example. (See also). So, you might want to profile and see whether using a different system allocator helps.

Upvotes: 1

Related Questions