user2766918
user2766918

Reputation: 584

dynamically sized classes in c++

I'd like to create a class Word, which houses a word. I'll have an instance of the class for pretty much every word in the dictionary (so lots) -- and no, I cannot use a tree structure to store this for my particular application. Of course, the size of the strings may vary, and I don't want to kill memory. I'd like to do this in a class, as so:

class Word {
    public:
        ...
    private:
        int         len;
        LetterData  letters[];
};

And then dynamically allocate Word using:

Word *pNewWord = malloc(sizeof(Word)+sizeof(LetterData)*len);   

I realize this isn't very C++'ish. So my questions are: first, is there a better way of doing this, and if not, is this going to cause problems? Word will not inherit any other class types (I'm fairly certain that inheritance would kill this...).

Note: memory usage and speed are very important -- I'd like to avoid an extra pointer per word, and I'd like to avoid an extra pointer deference per access...

Upvotes: 5

Views: 1626

Answers (4)

Sandro
Sandro

Reputation: 2786

Using open sized array as the last member of struct is quite common practice in C. And it was standardized in C99 (2.6.7."flexible array member").

C++ standard says nothing about "flexible array members". So it is not guarantee that it works in C++. But the most of the popular compilers support this feature even for C++:

  • gcc
  • msvc
  • clang "Note that clang does support flexible array members (arrays with a zero or unspecified size at the end of a structure)"

So actually it is possible to use it, but this is very dangerous practice. You should try to avoid it, and you need to be very careful if you decide to use it.

There are several potential issues if you are using this technique:

  • Any reallocation or deletion of memory buffer makes your object invalid.
  • Your class must be non-copyable. Any copy (usually implicit copy when you pass your object as an argument) makes the object invalid. You must forbid copy constructor and assignment operator.
  • Inheritance is not possible, you must forbid it.
  • You should initialize all data correctly to avoid garbage (maybe fill buffer with zeros and than use placement new operators)
  • This method is often used for serialization, saving and restoring the objects, but it is not portable. You need to think about sizes of members and big/little endians.

So if it is possible it is better to use something else. The solution with for example vector of LetterData as a member of class.

Upvotes: 5

ppetraki
ppetraki

Reputation: 428

The pointers aren't your problem, it's the lack of cache hits from having one object allocated over here and another one allocated elsewhere on the heap for the actual string. If you could allocate it all consecutively you would be in optimal cache hit territory.

I know you said no trees but if you wrote a custom allocator and plugged it into std::map or std::set you could achieve this functionality.

You should check out the cppcon talk on arena allocators.

"CppCon 2017: John Lakos “Local ('Arena') Memory Allocators”

A "simpler" way to achieve this is to maintain two vectors, for Word and a "insert chunk aligned vector for allocating range for LetterData" respectively. Word would get one more field which would be a offset into std::vector (or whatever segment size you choose). You would just placement new into that range and you could also just ignore the destructor, which will speed things up.

I bet once you fatten up your context object it will take just as much space as a map/set node.

Since you'll be stuffing tons of data into this it makes sense to preallocate the memory upfront. Perhaps you can achieve high performance by allocating sets of consecutive ranges in slabs as a opposed to one large consecutive range that will need to be "resized" eventually, which requires creating a new range and copying everything to it.

With all things, measure, measure, and measure before you go off and write a custom stack. Generate a range of random word lengths, search a subset, time it etc.

Upvotes: 2

ProXicT
ProXicT

Reputation: 1933

At a compile time, the compiler must be aware of the class size, so no matter what the class contains, the compiler needs to know the size of each member of that class.

Note, however, that the member letters is a pointer, which has a known size, so even if it points to a memory which can change it's size, it only needs to allocate space to hold that pointer.

If you want to use malloc to allocate the memory, which I don't recommend in C++, it's enough to give it the size of the class:

Word *pNewWord = reinterpret_cast<Word*>(malloc(sizeof(Word)));

However, if you sticked with C++ and used the new operator, the line reduces to:

Word *pNewWord = new Word;

Which not only allocates the memory needed, but also calls the class constructor, which is default in your case.

Upvotes: 0

Bathsheba
Bathsheba

Reputation: 234635

My starting point would be to build what would be, for now, little more than a proxy class for std::string:

class Word
{
public:
    std::size_t getSize() const; // in place of your 'len' member.

private:
    std::string m_data;
};

I would then build the rest of my program and only then if there is a performance issue with my Word class cf. other areas of the program would I attempt to refactor it, replacing m_data with something else.

My money is on that being unnecessary and your suspicions around the C++ standard library objects not having adequate performance unfounded.

Upvotes: 6

Related Questions