Reputation: 584
I'd like to create a class Word
, which houses a word. I'll have an instance of the class for pretty much every word in the dictionary (so lots) -- and no, I cannot use a tree structure to store this for my particular application. Of course, the size of the strings may vary, and I don't want to kill memory. I'd like to do this in a class, as so:
class Word {
public:
...
private:
int len;
LetterData letters[];
};
And then dynamically allocate Word using:
Word *pNewWord = malloc(sizeof(Word)+sizeof(LetterData)*len);
I realize this isn't very C++'ish. So my questions are: first, is there a better way of doing this, and if not, is this going to cause problems? Word will not inherit any other class types (I'm fairly certain that inheritance would kill this...).
Note: memory usage and speed are very important -- I'd like to avoid an extra pointer per word, and I'd like to avoid an extra pointer deference per access...
Upvotes: 5
Views: 1626
Reputation: 2786
Using open sized array as the last member of struct is quite common practice in C. And it was standardized in C99 (2.6.7."flexible array member").
C++ standard says nothing about "flexible array members". So it is not guarantee that it works in C++. But the most of the popular compilers support this feature even for C++:
So actually it is possible to use it, but this is very dangerous practice. You should try to avoid it, and you need to be very careful if you decide to use it.
There are several potential issues if you are using this technique:
So if it is possible it is better to use something else. The solution with for example vector of LetterData as a member of class.
Upvotes: 5
Reputation: 428
The pointers aren't your problem, it's the lack of cache hits from having one object allocated over here and another one allocated elsewhere on the heap for the actual string. If you could allocate it all consecutively you would be in optimal cache hit territory.
I know you said no trees but if you wrote a custom allocator and plugged it into std::map or std::set you could achieve this functionality.
You should check out the cppcon talk on arena allocators.
"CppCon 2017: John Lakos “Local ('Arena') Memory Allocators”
A "simpler" way to achieve this is to maintain two vectors, for Word and a "insert chunk aligned vector for allocating range for LetterData" respectively. Word would get one more field which would be a offset into std::vector (or whatever segment size you choose). You would just placement new into that range and you could also just ignore the destructor, which will speed things up.
I bet once you fatten up your context object it will take just as much space as a map/set node.
Since you'll be stuffing tons of data into this it makes sense to preallocate the memory upfront. Perhaps you can achieve high performance by allocating sets of consecutive ranges in slabs as a opposed to one large consecutive range that will need to be "resized" eventually, which requires creating a new range and copying everything to it.
With all things, measure, measure, and measure before you go off and write a custom stack. Generate a range of random word lengths, search a subset, time it etc.
Upvotes: 2
Reputation: 1933
At a compile time, the compiler must be aware of the class size, so no matter what the class contains, the compiler needs to know the size of each member of that class.
Note, however, that the member letters
is a pointer, which has a known size, so even if it points to a memory which can change it's size, it only needs to allocate space to hold that pointer.
If you want to use malloc
to allocate the memory, which I don't recommend in C++, it's enough to give it the size of the class:
Word *pNewWord = reinterpret_cast<Word*>(malloc(sizeof(Word)));
However, if you sticked with C++ and used the new
operator, the line reduces to:
Word *pNewWord = new Word;
Which not only allocates the memory needed, but also calls the class constructor, which is default in your case.
Upvotes: 0
Reputation: 234635
My starting point would be to build what would be, for now, little more than a proxy class for std::string
:
class Word
{
public:
std::size_t getSize() const; // in place of your 'len' member.
private:
std::string m_data;
};
I would then build the rest of my program and only then if there is a performance issue with my Word
class cf. other areas of the program would I attempt to refactor it, replacing m_data
with something else.
My money is on that being unnecessary and your suspicions around the C++ standard library objects not having adequate performance unfounded.
Upvotes: 6