Reputation: 545
I'd like to use C++ version of SymSpell, which is called SymSpellPlusPlus. In C# version using WordSegmentation looks like this (from the first link):
//word segmentation and correction for multi-word input strings with/without spaces
inputTerm="thequickbrownfoxjumpsoverthelazydog";
maxEditDistance = 0;
suggestion = symSpell.WordSegmentation(input);
//display term and edit distance
Console.WriteLine(suggestion.correctedString + " " + suggestion.distanceSum.ToString("N0"));
In C++ version method WordSegmentation returns shared pointer (from the second link):
...
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input)
{
return WordSegmentation(input, this->maxDictionaryEditDistance, this->maxDictionaryWordLength);
}
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input, size_t maxEditDistance)
{
return WordSegmentation(input, maxEditDistance, this->maxDictionaryWordLength);
}
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input, size_t maxEditDistance, size_t maxSegmentationWordLength)
{
// lines 1039 - 1179 under second link
std::vector<shared_ptr<WordSegmentationItem>> compositions;
...
return compositions[circularIndex];
}
In my code I tried among others the following code:
const char* inputTerm = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him";
auto suggestions = symSpell.WordSegmentation(inputTerm);
But it gives an error:
free() invalid next size (fast)
It is related to memory error, but I don't know how to overcome this problem.
Class WordSegmentationItem looks as follows (lines 292-325 in second link):
class WordSegmentationItem
{
public:
const char* segmentedString{ nullptr };
const char* correctedString{ nullptr };
u_int8_t distanceSum = 0;
double probabilityLogSum = 0;
WordSegmentationItem() { }
WordSegmentationItem(const symspell::WordSegmentationItem & p)
{
this->segmentedString = p.segmentedString;
this->correctedString = p.correctedString;
this->distanceSum = p.distanceSum;
this->probabilityLogSum = p.probabilityLogSum;
}
WordSegmentationItem& operator=(const WordSegmentationItem&) { return *this; }
WordSegmentationItem& operator=(WordSegmentationItem&&) { return *this; }
void set(const char* pSegmentedString, const char* pCorrectedString, u_int8_t pDistanceSum, double pProbabilityLogSum)
{
this->segmentedString = pSegmentedString;
this->correctedString = pCorrectedString;
this->distanceSum = pDistanceSum;
this->probabilityLogSum = pProbabilityLogSum;
}
~WordSegmentationItem()
{
delete[] segmentedString;
delete[] correctedString;
}
};
How should I get the correctedString from the WordSegmentationItem?
Upvotes: 1
Views: 371
Reputation: 385295
The library is buggy and the author needs to make some fixes.
First, compiling gives us a warning about SuggestItem::ShallowCopy
, which returns a local variable by reference. Very bad! We can change that to return by value, probably.
This doesn't fix the crash, though.
If we clone the library's repo then run the following testcase in a debugger:
#include "symspell6.h"
int main()
{
const char* inputTerm = "whereis th elove hehad dated forlmuch of thepast who couqdn'tread in sixtgrade and ins pired him";
symspell::SymSpell symSpell;
auto suggestions = symSpell.WordSegmentation(inputTerm);
}
…we see that returning compositions[circularIndex]
from the WordSegmentation
function is causing an invalid access in the shared_ptr
constructor. This suggests that circularIndex
is out-of-bounds and giving us a non-existent shared_ptr
. Indeed, circularIndex
is 95
but compositions.size()
is 0
!
The function is lacking some serious error checking.
Now, only the author (or at least someone who knows what the library is supposed to do; that's not me!) can fix this properly. But as a quick patch I added the following after line 1055:
if (compositions.empty())
return nullptr;
…and it now at least runs.
It seems that the function assumes the dictionary is non-empty. I don't know whether that's expected behaviour or not (other than the missing error checking as detailed above).
The project is in serious need of some documentation, because no preconditions or postconditions are mentioned for these functions and there is no indication as to how the library is supposed to be used. Again, the author should fix these things.
Upvotes: 3