Reputation: 93324

Algorithm to unify contiguous chunks in a collection of chunks

I'm creating a pre-allocator with dynamic memory chunk size, and I need to unify contiguous memory chunks.

struct Chunk // Chunk of memory
{
    Ptr begin, end; // [begin, end) range
}

struct PreAlloc
{
    std::vector<Chunk> chunks; // I need to unify contiguous chunks here
    ...
}

I tried a naive solution, that, after sorting the chunks based on their begin, basically did a pass through the vector checking if the next chunk's begin was equal to the current chunk's end. I'm sure it could be improved.

Is there a good algorithm to unify contiguous ranges?

Information:

Chunks can never "overlap".
Chunks can have any size greater than 0.
Performance is the most important factor.

Upvotes: 8

Answers (5)

DanielKO

Reputation: 4527

NOTE: there was an error in my original algorithm, where I only considered blocks to the left of the current block.

Use two associative tables (e.g. unordered_map), one mapping the begin address to the Chunk, another mapping the end to the Chunk. This lets you find the neighbouring blocks quickly. Alternatively, you can change the Chunk struct to store a pointer/id/whatever to the neighbouring Chunk, plus a flag to mark to tell if it's free.

The algorithm consists of scanning the vector of chunks once, while maintaining the invariant: if there is a neighbour to the left, you merge them; if there is a neighbour to the right, you merge them. At the end, just collect the remaining chunks.

Here's the code:

void unify(vector<Chunk>& chunks)
{
    unordered_map<Ptr, Chunk> begins(chunks.size() * 1.25); // tweak this
    unordered_map<Ptr, Chunk> ends(chunks.size() * 1.25); // tweak this

    for (Chunk c : chunks) {        
        // check the left
        auto left = ends.find(c.begin);
        if (left != ends.end()) { // found something to the left
            Chunk neighbour = left->second;
            c.begin = neighbour.begin;
            begins.erase(neighbour.begin);
            ends.erase(left);
        }
        // check the right
        auto right = begins.find(c.end);
        if (right != begins.end()) { // found something to the right
            Chunk neighbour = right->second;
            c.end = neighbour.end;
            begins.erase(right);
            ends.erase(neighbour.end);
        }

        begins[c.begin] = c;
        ends[c.end] = c;
    }
    chunks.clear();
    for (auto x : begins)
        chunks.push_back(x.second);
}

The algorithm has O(n) complexity assuming constant time access to the begins and ends tables (which is nearly what you get if you don't trigger rehashing, hence the "tweak this" comments). There are quite a few options to implement associative tables, make sure to try a few different alternatives; as pointed out in the comment by Ben Jackson, a hash table doesn't always make good use of cache, so even a sorted vector with binary searches might be faster.

If you can change the Chunk structure to store left/right pointers, you get a guaranteed O(1) lookup/insert/remove. Assuming you are doing this to consolidate free chunks of memory, the left/right checking can be done in O(1) during the free() call, so there is no need to consolidate it afterwards.

Upvotes: 8

user2249683

Reputation:

I think you can not do better then N log(N) - the naive approach. The idea using an unordered associative container I dislike - the hashing will degenerate performance. An improvement might be: keep the chunks sorted at each insert, making 'unify' O(N).

It seems you are writing some allocator, hence I dig up some old code of mine (with some adjustment regarding C++ 11 and without any warranty). The allocator is for small objects having a size <= 32 * sizeof(void*).

Code:

// Copyright (c) 1999, Dieter Lucking.
//
// Permission is hereby granted, free of charge, to any person or organization
// obtaining a copy of the software and accompanying documentation covered by
// this license (the "Software") to use, reproduce, display, distribute,
// execute, and transmit the Software, and to prepare derivative works of the
// Software, and to permit third-parties to whom the Software is furnished to
// do so, all subject to the following:
//
// The copyright notices in the Software and this entire statement, including
// the above license grant, this restriction and the following disclaimer,
// must be included in all copies of the Software, in whole or in part, and
// all derivative works of the Software, unless such copies or derivative
// works are solely in the form of machine-executable object code generated by
// a source language processor.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
// SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
// FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
// DEALINGS IN THE SOFTWARE.
//

#include <limits>
#include <chrono>
#include <iomanip>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>

// raw_allocator
// =============================================================================

class raw_allocator
{
    // Types
    // =====

    public:
    typedef std::size_t size_type;
    typedef std::ptrdiff_t difference_type;
    typedef void value_type;
    typedef void* pointer;
    typedef const void* const_pointer;
    typedef unsigned char byte_type;
    typedef byte_type* byte_pointer;
    typedef const unsigned char* const_byte_pointer;

    // Information
    // ===========

    public:
    static size_type max_size() noexcept {
        return std::numeric_limits<size_type>::max();
    }
    static size_type mem_size(size_type) noexcept;

    // Allocation.System
    // =================

    public:
    static pointer system_allocate(size_type) noexcept;
    static void system_allocate(size_type, pointer&, size_type&) noexcept;
    static void system_deallocate(pointer) noexcept;

    // Allocation
    // ==========

    public:
    static void allocate(size_type, pointer& result, size_type& capacity) noexcept;
    static pointer allocate(size_type n) noexcept {
        pointer result;
        allocate(n, result, n);
        return result;
    }

    static void deallocate(pointer p, size_type n) noexcept;

    // Allocation.Temporary:
    //======================

    public:
    static void allocate_temporary(size_type, pointer& result,
        size_type& capacity) noexcept;
    static pointer allocate_temporary(size_type n) noexcept {
        pointer result;
        allocate_temporary(n, result, n);
        return result;
    }

    static void deallocate_temporary(pointer, size_type) noexcept;

    // Logging
    // =======

    public:
    static void log(std::ostream& stream);
};

// static_allocator
// =============================================================================

template<class T> class static_allocator;

template<>
class static_allocator<void>
{
    public:
    typedef void value_type;
    typedef void* pointer;
    typedef const void* const_pointer;

    template<class U> struct rebind
    {
        typedef static_allocator<U> other;
    };
};

template<class T>
class static_allocator
{
    // Types
    // =====

    public:
    typedef raw_allocator::size_type size_type;
    typedef raw_allocator::difference_type difference_type;
    typedef T value_type;
    typedef T& reference;
    typedef const T& const_reference;

    typedef T* pointer;
    typedef const T* const_pointer;

    template<class U> struct rebind
    {
        typedef static_allocator<U> other;
    };

    // Construction/Destruction
    // ========================

    public:
    static_allocator() noexcept {};
    static_allocator(const static_allocator&) noexcept {};
    ~static_allocator() noexcept {};

    // Information
    // ===========

    public:
    static size_type max_size() noexcept {
        return raw_allocator::max_size() / sizeof(T);
    }
    static size_type mem_size(size_type n) noexcept {
        return raw_allocator::mem_size(n * sizeof(T)) / sizeof(T);
    }
    static pointer address(reference x) {
        return &x;
    }
    static const_pointer address(const_reference x) {
        return &x;
    }

    // Construct/Destroy
    //==================

    public:
    static void construct(pointer p, const T& value) {
        new ((void*) p) T(value);
    }
    static void destroy(pointer p) {
        ((T*) p)->~T();
    }

    // Allocation
    //===========

    public:
    static pointer allocate(size_type n) noexcept {
        return (pointer)raw_allocator::allocate(n * sizeof(value_type));
    }

    static void allocate(size_type n, pointer& result, size_type& capacity) noexcept
    {
        raw_allocator::pointer p;
        raw_allocator::allocate(n * sizeof(value_type), p, capacity);
        result = (pointer)(p);
        capacity /= sizeof(value_type);
    }

    static void deallocate(pointer p, size_type n) noexcept {
        raw_allocator::deallocate(p, n * sizeof(value_type));
    }

    // Allocation.Temporary
    // ====================

    static pointer allocate_temporary(size_type n) noexcept {
        return (pointer)raw_allocator::allocate_temporary(n * sizeof(value_type));
    }

    static void allocate_temporary(size_type n, pointer& result,
        size_type& capacity) noexcept
    {
        raw_allocator::pointer p;
        raw_allocator::allocate_temporary(n * sizeof(value_type), p, capacity);
        result = (pointer)(p);
        capacity /= sizeof(value_type);
    }

    static void deallocate_temporary(pointer p, size_type n) noexcept {
        raw_allocator::deallocate_temporary(p, n);
    }

    // Logging
    // =======

    public:
    static void log(std::ostream& stream) {
        raw_allocator::log(stream);
    }
};

template <class T1, class T2>
inline bool operator ==(const static_allocator<T1>&,
    const static_allocator<T2>&) noexcept {
    return true;
}
template <class T1, class T2>
inline bool operator !=(const static_allocator<T1>&,
    const static_allocator<T2>&) noexcept {
    return false;
}

// allocator:
// =============================================================================

template<class T> class allocator;

template<>
class allocator<void>
{
    public:
    typedef static_allocator<void>::value_type value_type;
    typedef static_allocator<void>::pointer pointer;
    typedef static_allocator<void>::const_pointer const_pointer;

    template<class U> struct rebind
    {
        typedef allocator<U> other;
    };
};

template<class T>
class allocator
{
    // Types
    // =====

    public:
    typedef typename static_allocator<T>::size_type size_type;
    typedef typename static_allocator<T>::difference_type difference_type;
    typedef typename static_allocator<T>::value_type value_type;
    typedef typename static_allocator<T>::reference reference;
    typedef typename static_allocator<T>::const_reference const_reference;
    typedef typename static_allocator<T>::pointer pointer;
    typedef typename static_allocator<T>::const_pointer const_pointer;

    template<class U> struct rebind
    {
        typedef allocator<U> other;
    };

    // Constructor/Destructor
    // ======================

    public:
    template <class U>
    allocator(const allocator<U>&) noexcept {}
    allocator() noexcept {};
    allocator(const allocator&) noexcept {};
    ~allocator() noexcept {};

    // Information
    // ===========

    public:
    size_type max_size() const noexcept {
        return static_allocator<T>::max_size();
    }
    pointer address(reference x) const {
        return static_allocator<T>::address(x);
    }
    const_pointer address(const_reference x) const {
        return static_allocator<T>::address(x);
    }

    // Construct/Destroy
    // =================

    public:
    void construct(pointer p, const T& value) {
        static_allocator<T>::construct(p, value);
    }
    void destroy(pointer p) {
        static_allocator<T>::destroy(p);
    }

    // Allocation
    // ==========

    public:
    pointer allocate(size_type n, typename allocator<void>::const_pointer = 0) {
        return static_allocator<T>::allocate(n);
    }
    void deallocate(pointer p, size_type n) {
        static_allocator<T>::deallocate(p, n);
    }

    // Logging
    // =======

    public:
    static void log(std::ostream& stream) {
        raw_allocator::log(stream);
    }
};

template <class T1, class T2>
inline bool operator ==(const allocator<T1>&, const allocator<T2>&) noexcept {
    return true;
}

template <class T1, class T2>
inline bool operator !=(const allocator<T1>&, const allocator<T2>&) noexcept {
    return false;
}

// Types
// =============================================================================

typedef raw_allocator::size_type size_type;
typedef raw_allocator::byte_pointer BytePointer;

struct LinkType
{
    LinkType* Link;
};

struct FreelistType
{
    LinkType* Link;
};

// const
// =============================================================================

// Memory layout:
// ==============
//
// Freelist
// Index          Request                            Alignment
// =============================================================================
// [ 0 ...  7]    [    0 * align ...    8 * align]   every 1 * align bytes
// [ 8 ... 11]    (    8 * align ...   16 * align]   every 2 * align bytes
// [12 ... 13]    (   16 * align ...   24 * align]   every 4 * align bytes
// [14]           (   24 * align ...   32 * align]         8 * align bytes
//
// temporary memory:
// [15]           [    0 * align ...  256 * align]   256 * align

static const unsigned FreeListArraySize = 16;
static const size_type FreelistInitSize = 16;
static const size_type MinAlign =
    (8 < 2 * sizeof(void*)) ? 2 * sizeof(void*) : 8;
static const size_type MaxAlign = 32 * MinAlign;
static const size_type MaxIndex = 14;
static const size_type TmpIndex = 15;
static const size_type TmpAlign = 256 * MinAlign;

static const size_type IndexTable[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10,
    10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14 };
static_assert(sizeof(IndexTable) / sizeof(size_type) == MaxAlign / MinAlign, "Invalid Index Table");

inline size_type get_index(size_type n) {
    return IndexTable[long(n - 1) / MinAlign];
}

static const size_type AlignTable[] = { MinAlign * 1, MinAlign * 2, MinAlign
    * 3, MinAlign * 4, MinAlign * 5, MinAlign * 6, MinAlign * 7, MinAlign * 8,
    MinAlign * 10, MinAlign * 12, MinAlign * 14, MinAlign * 16, MinAlign * 20,
    MinAlign * 24, MinAlign * 32, TmpAlign, };
static_assert(sizeof(AlignTable) / sizeof(size_type) == TmpIndex + 1, "Invalid Align Table");

inline size_type get_align(size_type i) {
    return AlignTable[i];
}

// Thread
// ============================================================================

static LinkType* Freelist[FreeListArraySize];
static BytePointer HeapBeg;
static BytePointer HeapEnd;
static size_type TotalHeapSize;

static std::mutex FreelistMutex[FreeListArraySize] = { };
inline void lock_free_list(size_type i) {
    FreelistMutex[i].lock();
}

inline void unlock_free_list(size_type i) {
    FreelistMutex[i].unlock();
}

// Allocation
// ============================================================================

// Requiers: freelist[index] is locked
LinkType* allocate_free_list(size_type index) noexcept {
    static std::mutex mutex;
    const size_type page_size = 4096; // FIXME some system_page_size();
    std::lock_guard<std::mutex> guard(mutex);
    size_type heap_size = HeapEnd - HeapBeg;
    size_type align = get_align(index);
    if(heap_size < align) {
        LinkType* new_list = (LinkType*)(HeapBeg);
        // If a temporary list:
        if(MaxAlign <= heap_size) {
            LinkType* current = new_list;
            LinkType* next;
            while(2*MaxAlign <= heap_size) {
                next = (LinkType*)(BytePointer(current) + MaxAlign);
                current->Link = next;
                current = next;
                heap_size -= MaxAlign;
            }
            if(index != MaxIndex) lock_free_list(MaxIndex);
            current->Link = Freelist[MaxIndex];
            Freelist[MaxIndex] = new_list;
            if(index != MaxIndex) unlock_free_list(MaxIndex);
            new_list = (LinkType*)(BytePointer(current) + MaxAlign);
            heap_size -= MaxAlign;
        }
        if(MinAlign <= heap_size) {
            std::cout << "heap_size: " << heap_size << std::endl;
            size_type i = get_index(heap_size);
            if(heap_size < get_align(i)) --i;
            if(index != i) lock_free_list(i);
            new_list->Link = Freelist[i];
            Freelist[i] = new_list;
            if(index != i) unlock_free_list(i);
        }
        heap_size = FreelistInitSize * align + TotalHeapSize / FreelistInitSize;
        heap_size = (((heap_size - 1) / page_size) + 1) * page_size;
        HeapBeg = BytePointer(raw_allocator::system_allocate(heap_size));
        if(HeapBeg) {
            HeapEnd = HeapBeg + heap_size;
            TotalHeapSize += heap_size;
        }
        else {
            HeapEnd = 0;
            size_type i = FreeListArraySize;
            while(HeapBeg == 0) {
                --i;
                if(i <= index) return 0;
                lock_free_list(i);
                if(Freelist[i]) {
                    heap_size = get_align(i);
                    HeapBeg = (BytePointer)(Freelist[i]);
                    HeapEnd = HeapBeg + heap_size;
                    Freelist[i] = Freelist[i]->Link;
                }
                unlock_free_list(i);
            }
        }
    }

    size_type size = FreelistInitSize * align;
    size_type count = FreelistInitSize;
    if(heap_size < size) {
        count = heap_size / align;
        size = align * count;
    }
    LinkType* beg_list = (LinkType*)(HeapBeg);
    LinkType* end_list = beg_list;
    while(--count) {
        LinkType* init = (LinkType*)(BytePointer(end_list) + align);
        end_list->Link = init;
        end_list = init;
    }
    LinkType*& freelist = Freelist[index];
    end_list->Link = freelist;
    freelist = beg_list;
    HeapBeg += size;
    return freelist;
}

// raw_allocator
// ============================================================================

// size
// ====

raw_allocator::size_type
raw_allocator::mem_size(size_type n) noexcept {
    if( ! n) return 0;
    else {
        if(n <= MaxAlign) return get_align(get_index(n));
        else return ((difference_type(n) - 1) / difference_type(MaxAlign)) * MaxAlign
        + MaxAlign;
    }
}

// allocation.system
// =================

raw_allocator::pointer raw_allocator::system_allocate(size_type n) noexcept
{
    return ::malloc(n);
}

void raw_allocator::system_allocate(size_type n, pointer& p, size_type& capacity) noexcept
{
    capacity = mem_size(n);
    p = ::malloc(capacity);
    if(p == 0) capacity = 0;
}

void raw_allocator::system_deallocate(pointer p) noexcept {
    ::free(p);
}

// allocation
// ==========

void raw_allocator::allocate(size_type n, pointer& p, size_type& capacity) noexcept
{
    if(n == 0 || MaxAlign < n) system_allocate(n, p, capacity);
    else {
        p = 0;
        capacity = 0;
        size_type index = get_index(n);
        lock_free_list(index);
        LinkType*& freelist = Freelist[index];
        if(freelist == 0) {
            freelist = allocate_free_list(index);
        }
        if(freelist != 0) {
            p = freelist;
            capacity = get_align(index);
            freelist = freelist->Link;
        }
        unlock_free_list(index);
    }
}

void raw_allocator::deallocate(pointer p, size_type n) noexcept {
    if(p) {
        if(n == 0 || MaxAlign < n) system_deallocate(p);
        else {
            size_type index = get_index(n);
            lock_free_list(index);
            LinkType*& freelist = Freelist[index];
            LinkType* new_list = ((LinkType*)(p));
            new_list->Link = freelist;
            freelist = new_list;
            unlock_free_list(index);
        }
    }
}

// allocation.temporary
// ====================

void raw_allocator::allocate_temporary(size_type n, pointer& p,
    size_type& capacity) noexcept
{
    if(n == 0 || size_type(TmpAlign) < n) system_allocate(n, p, capacity);
    else {
        p = 0;
        capacity = 0;
        lock_free_list(TmpIndex);
        LinkType*& freelist = Freelist[TmpIndex];
        if(freelist == 0) freelist = allocate_free_list(TmpIndex);
        if(freelist != 0) {
            p = freelist;
            freelist = freelist->Link;
            capacity = TmpAlign;
        }
        unlock_free_list(TmpIndex);
    }
}

void raw_allocator::deallocate_temporary(pointer p, size_type n) noexcept {
    if(p) {
        if(n == 0 || size_type(TmpAlign) < n) system_deallocate(p);
        else {
            lock_free_list(TmpIndex);
            LinkType*& freelist = Freelist[TmpIndex];
            LinkType* new_list = ((LinkType*)(p));
            new_list->Link = freelist;
            freelist = new_list;
            unlock_free_list(TmpIndex);
        }
    }
}

void raw_allocator::log(std::ostream& stream) {
    stream << " Heap Size:       " << TotalHeapSize << '\n';
    size_type total_size = 0;
    for (unsigned i = 0; i < FreeListArraySize; ++i) {
        size_type align = get_align(i);
        size_type size = 0;
        size_type count = 0;
        lock_free_list(i);
        LinkType* freelist = Freelist[i];
        while (freelist) {
            size += align;
            ++count;
            freelist = freelist->Link;
        }
        total_size += size;
        unlock_free_list(i);
        stream << "  Freelist: " << std::setw(4) << align << ": " << size
            << " [" << count << ']' << '\n';
    }
    size_type heap_size = HeapEnd - HeapBeg;
    stream << " Freelists:       " << total_size << '\n';
    stream << " Free Heap:       " << heap_size << '\n';
    stream << " Allocated:       " << TotalHeapSize - total_size - heap_size
        << '\n';
}

int main() {
    const unsigned sample_count = 100000;
    std::vector<char*> std_allocate_pointers;
    std::vector<char*> allocate_pointers;
    std::vector<unsigned> sample_sizes;

    typedef std::chrono::nanoseconds duration;
    duration std_allocate_duration;
    duration std_deallocate_duration;
    duration allocate_duration;
    duration deallocate_duration;

    std::allocator<char> std_allocator;
    allocator<char> allocator;
    for (unsigned i = 0; i < sample_count; ++i) {
        if (std::rand() % 2) {
            unsigned size = unsigned(std::rand()) % MaxAlign;
            //std::cout << "  Allocate: " << size << std::endl;
            sample_sizes.push_back(size);
            {
                auto start = std::chrono::high_resolution_clock::now();
                auto p = std_allocator.allocate(size);
                auto end = std::chrono::high_resolution_clock::now();
                std_allocate_pointers.push_back(p);
                std_allocate_duration += std::chrono::duration_cast<duration>(
                    end - start);
            }
            {
                auto start = std::chrono::high_resolution_clock::now();
                auto p = allocator.allocate(size);
                auto end = std::chrono::high_resolution_clock::now();
                allocate_pointers.push_back(p);
                allocate_duration += std::chrono::duration_cast<duration>(
                    end - start);
            }
        }
        else {
            if (!sample_sizes.empty()) {
                char* std_p = std_allocate_pointers.back();
                char* p = allocate_pointers.back();
                unsigned size = sample_sizes.back();
                //std::cout << "Deallocate: " << size << std::endl;
                {
                    auto start = std::chrono::high_resolution_clock::now();
                    std_allocator.deallocate(std_p, size);
                    auto end = std::chrono::high_resolution_clock::now();
                    std_deallocate_duration += std::chrono::duration_cast<
                        duration>(end - start);
                }
                {
                    auto start = std::chrono::high_resolution_clock::now();
                    allocator.deallocate(p, size);
                    auto end = std::chrono::high_resolution_clock::now();
                    deallocate_duration += std::chrono::duration_cast<duration>(
                        end - start);
                }
                std_allocate_pointers.pop_back();
                allocate_pointers.pop_back();
                sample_sizes.pop_back();
            }
        }
    }
    for (unsigned i = 0; i < sample_sizes.size(); ++i) {
        unsigned size = sample_sizes[i];
        std_allocator.deallocate(std_allocate_pointers[i], size);
        allocator.deallocate(allocate_pointers[i], size);

    }
    std::cout << "std_allocator: "
        << (std_allocate_duration + std_deallocate_duration).count() << " "
        << std_allocate_duration.count() << " "
        << std_deallocate_duration.count() << std::endl;
    std::cout << "    allocator: "
        << (allocate_duration + deallocate_duration).count() << " "
        << allocate_duration.count() << " " << deallocate_duration.count()
        << std::endl;
    raw_allocator::log(std::cout);
    return 0;
}

Note: The raw allocator never release memory to the system (That might be a bug).
Note: Without optimizations enabled the performance is lousy (g++ -std=c++11 -O3 ...)

Result:

std_allocator: 11645000 7416000 4229000
    allocator: 5155000 2758000 2397000
 Heap Size:       94208
  Freelist:   16: 256 [16]
  Freelist:   32: 640 [20]
  Freelist:   48: 768 [16]
  Freelist:   64: 1024 [16]
  Freelist:   80: 1280 [16]
  Freelist:   96: 1536 [16]
  Freelist:  112: 1792 [16]
  Freelist:  128: 2176 [17]
  Freelist:  160: 5760 [36]
  Freelist:  192: 6144 [32]
  Freelist:  224: 3584 [16]
  Freelist:  256: 7936 [31]
  Freelist:  320: 10240 [32]
  Freelist:  384: 14208 [37]
  Freelist:  512: 34304 [67]
  Freelist: 4096: 0 [0]
 Freelists:       91648
 Free Heap:       2560
 Allocated:       0

Upvotes: 4

zarzare

Reputation: 166

It seemed like an interesting problem so I invested some time in it. The aproach you took is far from being naive. Actually it has pretty good results. It can definetly be optimized further though. I will assume the list of chunks is not already sorted because your algo is probably optimal then. To optimize it my aproach was to optimize the sort itself eliminating the chunks that can be combined during the sort, thus making the sort faster for the remaining elements. The code below is basically a modified version of bubble-sort. I also implemented your solution using std::sort just for comparison. The results are suprisingly good using my also. For a data set of 10 million chunks the combined sort with the merge of chunks performs 20 times faster. The output of the code is (algo1 is std::sort followed by merging consecutive elements, algo 2 is the sort optimized with removing the chunks that can be merged):

generating input took:  00:00:19.655999
algo 1 took 00:00:00.968738
  initial chunks count: 10000000, output chunks count: 3332578
algo 2 took 00:00:00.046875
  initial chunks count: 10000000, output chunks count: 3332578

You can probably improve it further using a better sort algo like introsort.

full code:

#include <vector>
#include <map>
#include <set>
#include <iostream>
#include <boost\date_time.hpp>

#define CHUNK_COUNT 10000000

struct Chunk // Chunk of memory
{
    char *begin, *end; // [begin, end) range

    bool operator<(const Chunk& rhs) const
    {
        return begin < rhs.begin;
    }
};

std::vector<Chunk> in;

void generate_input_data()
{
    std::multimap<int, Chunk> input_data;
    Chunk chunk;
    chunk.begin = 0;
    chunk.end = 0;
    for (int i = 0; i < CHUNK_COUNT; ++i)
    {
        int continuous = rand() % 3; // 66% chance of a chunk being continuous
        if (continuous)
            chunk.begin = chunk.end;
        else
            chunk.begin = chunk.end + rand() % 100 + 1;
        int chunk_size = rand() % 100 + 1;
        chunk.end = chunk.begin + chunk_size;
        input_data.insert(std::multimap<int, Chunk>::value_type(rand(), chunk));
    }
    // now we have the chunks randomly ordered in the map
    // will copy them in the input vector
    for (std::multimap<int, Chunk>::const_iterator it = input_data.begin(); it != input_data.end(); ++it)
        in.push_back(it->second);
}

void merge_chunks_sorted(std::vector<Chunk>& chunks)
{
    if (in.empty())
        return;
    std::vector<Chunk> res;
    Chunk ch = in[0];
    for (size_t i = 1; i < in.size(); ++i)
    {
        if (in[i].begin == ch.end)
        {
            ch.end = in[i].end;
        } else
        {
            res.push_back(ch);
            ch = in[i];
        }
    }
    res.push_back(ch);
    chunks = res;
}

void merge_chunks_orig_algo(std::vector<Chunk>& chunks)
{
    std::sort(in.begin(), in.end());
    merge_chunks_sorted(chunks);
}

void merge_chunks_new_algo(std::vector<Chunk>& chunks)
{
    size_t new_last_n = 0;
    Chunk temp;
    do {
        int last_n = new_last_n;
        new_last_n = chunks.size() - 1;
        for (int i = chunks.size() - 2; i >= last_n; --i)
        {
            if (chunks[i].begin > chunks[i + 1].begin)
            {
                if (chunks[i].begin == chunks[i + 1].end)
                {
                    chunks[i].begin = chunks[i + 1].begin;
                    if (i + 1 != chunks.size() - 1)
                        chunks[i + 1] = chunks[chunks.size() - 1];
                    chunks.pop_back();
                } else
                {
                    temp = chunks[i];
                    chunks[i] = chunks[i + 1];
                    chunks[i + 1] = temp;
                }
                new_last_n = i + 1;
            } else
            {
                if (chunks[i].end == chunks[i + 1].begin)
                {
                    chunks[i].end = chunks[i + 1].end;
                    if (i + 1 != chunks.size() - 1)
                        chunks[i + 1] = chunks[chunks.size() - 1];
                    chunks.pop_back();
                }
            }
        }
    } while (new_last_n < chunks.size() - 1);
}

void run_algo(void (*algo)(std::vector<Chunk>&))
{
    static int count = 1;
    // allowing the algo to modify the input vector is intentional
    std::vector<Chunk> chunks = in;
    size_t in_count = chunks.size();
    boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
    algo(chunks);
    boost::posix_time::ptime stop = boost::posix_time::microsec_clock::local_time();
    std::cout<<"algo "<<count++<<" took "<<stop - start<<std::endl;
    // if all went ok, statistically we should have around 33% of the original chunks count in the output vector
    std::cout<<"  initial chunks count: "<<in_count<<", output chunks count: "<<chunks.size()<<std::endl;
}

int main()
{
    boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
    generate_input_data();
    boost::posix_time::ptime stop = boost::posix_time::microsec_clock::local_time();
    std::cout<<"generating input took:\t"<<stop - start<<std::endl;

    run_algo(merge_chunks_orig_algo);
    run_algo(merge_chunks_new_algo);

    return 0;
}

I've seen below you mention n is not that high. so I rerun the test with 1000 chunks, 1000000 runs to make the times significant. The modified bubble sort still performs 5 times better. Basically for 1000 chunks total run time is 3 microseconds. Numbers below.

generating input took:  00:00:00
algo 1 took 00:00:15.343456, for 1000000 runs
  initial chunks count: 1000, output chunks count: 355
algo 2 took 00:00:03.374935, for 1000000 runs
  initial chunks count: 1000, output chunks count: 355

Upvotes: 3

smossen

Reputation: 887

Add pointers to the chunk struct for previous and next adjacent chunk in contiguous memory, if such exists, null otherwise. When a chunk is released you check if adjacent chunks are free, and if they are you merge them and update prev->next and next->prev pointers. This procedure is O(1) and you do it each time a chunk is released.

Some memory allocators put the size of current and previous chunk at the memory position immediately before the address returned by malloc. It is then possible calculate the offset to adjacent chunks without explicit pointers.

Upvotes: 2

Andrew Tomazos

Reputation: 68698

The following doesn't require sorted input or provide sorted output. Treat the input as a stack. Pop a chunk off and check if it is adjacent to a member of the initially empty output set. If not, add it to the output set. If it is adjacent, remove the adjacent chunk from the output set and push the new combined chunk onto the input stack. Repeat until input is empty.

vector<Chunk> unify_contiguous(vector<Chunk> input)
{
    vector<Chunk> output;

    unordered_set<Ptr, Chunk> begins;
    unordered_set<Ptr, Chunk> ends;

    while (!input.empty())
    {
        // pop chunk from input
        auto chunk = input.back();
        input.pop_back();

        // chunk end adjacent to an output begin?
        auto it = begins.find(chunk.end);
        if (it != begins.end())
        {
            auto end = it->second.end;

            Chunk combined{chunk.begin, end};
            ends.erase(end);
            begins.erase(it);
            input.push_back(combined);
            continue;
        }

        // chunk begin adjacent to an output end?
        it = ends.find(chunk.begin);
        if (it != ends.end())
        {
             auto begin = it->second.begin;

             Chunk combined{begin, chunk.end};
             begins.erase(begin);
             ends.erase(it);
             input.push_back(combined);
             continue;
        }

        // if not add chunk to output    
        begins[chunk.begin] = chunk;
        ends[chunk.end] = chunk;
     }

     // collect output
     for (auto kv : begins)
        output.push_back(kv.second);

     return output;
}

Upvotes: 0

Algorithm to unify contiguous chunks in a collection of chunks

Answers (5)

Related Questions