Brandon
Brandon

Reputation: 23500

Why std::string allocating twice?

I wrote a custom allocator for std::string and std::vector as follows:

#include <cstdint>
#include <iterator>
#include <iostream>

template <typename T>
struct PSAllocator
{
    typedef std::size_t size_type;
    typedef std::ptrdiff_t difference_type;
    typedef T* pointer;
    typedef const T* const_pointer;
    typedef T& reference;
    typedef const T& const_reference;
    typedef T value_type;

    template<typename U>
    struct rebind {typedef PSAllocator<U> other;};

    PSAllocator() throw() {};
    PSAllocator(const PSAllocator& other) throw() {};

    template<typename U>
    PSAllocator(const PSAllocator<U>& other) throw() {};

    template<typename U>
    PSAllocator& operator = (const PSAllocator<U>& other) { return *this; }
    PSAllocator<T>& operator = (const PSAllocator& other) { return *this; }
    ~PSAllocator() {}


    pointer allocate(size_type n, const void* hint = 0)
    {
        std::int32_t* data_ptr = reinterpret_cast<std::int32_t*>(::operator new(n * sizeof(value_type)));
        std::cout<<"Allocated: "<<&data_ptr[0]<<" of size: "<<n<<"\n";
        return reinterpret_cast<pointer>(&data_ptr[0]);
    }

    void deallocate(T* ptr, size_type n)
    {
        std::int32_t* data_ptr = reinterpret_cast<std::int32_t*>(ptr);
        std::cout<<"De-Allocated: "<<&data_ptr[0]<<" of size: "<<n<<"\n";
        ::operator delete(reinterpret_cast<T*>(&data_ptr[0]));
    }
};

Then I ran the following test case:

int main()
{
    typedef std::basic_string<char, std::char_traits<char>, PSAllocator<char>> cstring;

    cstring* str = new cstring();
    str->resize(1);
    delete str;

    std::cout<<"\n\n\n\n";

    typedef std::vector<char, PSAllocator<char>> cvector;

    cvector* cv = new cvector();
    cv->resize(1);
    delete cv;
}

For whatever odd reason, it goes on to print:

Allocated: 0x3560a0 of size: 25
Allocated: 0x3560d0 of size: 26
De-Allocated: 0x3560a0 of size: 25
De-Allocated: 0x3560d0 of size: 26




Allocated: 0x351890 of size: 1
De-Allocated: 0x351890 of size: 1

So why does it allocate twice for std::string and a lot more bytes?

I'm using g++ 4.8.1 x64 sjlj on Windows 8 from: http://sourceforge.net/projects/mingwbuilds/.

Upvotes: 3

Views: 560

Answers (1)

MvG
MvG

Reputation: 60868

I can't reproduce the double allocation, since apparently my libstdc++ does not allocate anything at all for the empty string. The resize however does allocate 26 bytes, and gdb helps me identifying how they are composed:

size_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
                   (     1     + 1) *     1          +     24

So the memory is mostly for this _Rep representation, which in turn consists of the following data members:

size_type    _M_length;   // 8 bytes
size_type    _M_capacity; // 8 bytes
_Atomic_word _M_refcount; // 4 bytes

I guess the last four bytes is just for the sake of alignment, but I might have missed some data element.

I guess the main reason why this _Rep structure is allocated on the heap is that it can be shared among string instances, and perhaps also that it can be avoided for empty strings as the lack of a first allocation on my system suggests.

To find out why your implementation doesn't make use of this empty string optimization, have a look at the default constructor. Its implementation seems to depend on the value of _GLIBCXX_FULLY_DYNAMIC_STRING, which apparently is non-zero in your setup. I'd not advise changing that setting directly, since it starts with an underscore and is therefore considered private. But you might find some public setting to affect this value.

Upvotes: 3

Related Questions