Parul Setia
Parul Setia

Reputation: 23

Spell Checker : Ternary Search tree

i have made a spell checker code using Ternary search tree. Can anybody tell me how to find the next possible word in TST. for example if i want to search if i search a word "Manly" in spell checker and the word is not present in TST, so the output it gives like DO YOU MEAN: "Man" "Mango" . .means possible near words

Upvotes: 1

Views: 757

Answers (2)

rit
rit

Reputation: 19

I have implemented my own spell checker but instead of a simple ternary trie I am using a ternary dag as Peter Kankowski suggests here. You can look at my blog for some details and how I did it. Its in Greek, but you can get an idea.

Edit:

Ok, you are right. The basic idea is to use a pre-created list of candidates for a given edit distance ( a value of 2 for me is ok ). To reduce the size of the list one can use wildcard characters. Such a list, of course, can be constructed in different ways. I prefer for / while loops like this ( e.g. for candidates of two substitutions )

void Substitute2( vector<wchar_t*>& v, const wstring& w )
{
    size_t len = w.size();
    if ( len < 2 )
        return;
    size_t p1 = 0, p2 = 1;
    while ( p1 < len ) {
        p2 = p1 + 1;
        while ( p2 < len ) {
            wchar_t* chars = new wchar_t[ len + 1 ];
            for ( size_t i = 0; i < len; ++i ) {
                if ( i != p1 && i != p2 ) {
                    chars[ i ] = w[ i ];
                }
            }
            chars[ p1 ] = '?';
            chars[ p2 ] = '?';
            chars[ len ] = '\0';
            v.push_back( chars );
            p2++;
        }
        p1++;
    }
}

After having prepared the candidates list, a simple search in a ternary dag for each item in the list will give us the suggestions for that misspelled word.

void Search( FileNode* pDict, FileNode* pNode, const wchar_t* Word, wstring Sug, set<wstring>& List )
{
    if ( IsNullLink( pNode, pDict ) )
        return;
    if ( *Word == '?' ) {
        Search( pDict, GetLo( pNode, pDict ), Word, Sug, List );
        Search( pDict, GetEq( pNode, pDict ), Word + 1, Sug + pNode->Char, List );
        Search( pDict, GetHi( pNode, pDict ), Word, Sug, List );
    } else {
        if ( *Word < pNode->Char ) {
            Search( pDict, GetLo( pNode, pDict ), Word, Sug, List );
        } else if ( *Word > pNode->Char ) {
            Search( pDict, GetHi( pNode, pDict ), Word, Sug, List );
        } else {
            if ( pNode->Char == '\0' )
            {
                List.insert( Sug );
            }
            if ( *Word != '\0' ) {
                Search( pDict, GetEq( pNode, pDict ), Word + 1, Sug + pNode->Char, List );
            }
        }
    }
}

Note: The dictionary is a compiled (file based) ternary dag

Upvotes: 1

Matt Ko
Matt Ko

Reputation: 979

The search for the word in your TST will terminate at a specific location in the tree. From this point, you can simply go up one level in the tree until you hit a level where there is more than just the child you came from.

On that level, you can simply choose the other possible paths and return those words.

Upvotes: 0

Related Questions