user2926033
user2926033

Reputation:

Reconstructing a string of words using a dictionary into an English sentence

I am completely stumped. The question is: given you have a string like "thisisasentence" and a function isWord() that returns true if it is an English word, I would get stuck on "this is a sent"

How can I recursively return and keep track of where I am each time?

Upvotes: 0

Views: 246

Answers (2)

hidefromkgb
hidefromkgb

Reputation: 5903

Just adding to the answer above.

According to my experience, many people understand recursion better when they see a «linearized» version of a recursive algorithm, which means «implemented as a loop over a stack». Linearization is applicable to any recursive task.

Assuming that isWord() has two parameters (1st: string to test; 2nd: its length) and returns a boolean-compatible value, a C implementation of backtracking is as follows:

void doSmth(char *phrase, int *words, int total) {
    int i;

    for (i = 0; i < total; ++i)
        printf("%.*s ", words[i + 1] - words[i], phrase + words[i]);
    printf("\n");
}

void parse(char *phrase) {
    int current, length, *words;

    if (phrase) {
        words = (int*)calloc((length = strlen(phrase)) + 2, sizeof(int));
        current = 1;
        while (current) {
            for (++words[current]; words[current] <= length; ++words[current])
                if (isWord(phrase + words[current - 1],
                           words[current] - words[current - 1])) {
                    words[current + 1] = words[current];
                    current++;
                }
            if (words[--current] == length)
                doSmth(phrase, words, current); /** parse successful! **/
        }
        free(words);
    }
}

As can be seen, for each word, a pair of stack values are used, the first of which being an offset to the current word`s first character, whereas the second is a potential offset of a character exactly after the current word`s last one (thus being the next word`s first character). The second value of the current word (the one whose pair is at the top of our «stack») is iterated through all characters left in the phrase.

When a word is accepted, a new second value (equalling the current, to only look at positions after it) is pushed to the stack, making the former second the first in a new pair. If the current word (the one just found) completes the phrase, something useful is performed; see doSmth().

If there are no more acceptable words in the remaining part of our phrase, the current word is considered unsuitable, and its second value is discarded from the stack, effectively repeating a search for words at a previous starting location, while the ending location is now farther than the word previously accepted there.

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727067

You need backtracking, which is easily achievable using recursion. Key observation is that you do not need to keep track of where you are past the moment when you are ready to return a solution.

You have a valid "split" when one of the following is true:

  • The string w is empty (base case), or
  • You can split non-empty w into substrings p and s, such that p+s=w, p is a word, and s can be split into a sentence (recursive call).

An implementation can return a list of words when successful split is found, or null when it cannot be found. Base case will always return an empty list; recursive case will, upon finding a p, s split that results in non-null return for s, construct a list with p prefixed to the list returned from the recursive call.

The recursive case will have a loop in it, trying all possible prefixes of w. To speed things up a bit, the loop could terminate upon reaching the prefix that is equal in length to the longest word in the dictionary. For example, if the longest word has 12 characters, you know that trying prefixes 13 characters or longer will not result in a match, so you could cut enumeration short.

Upvotes: 1

Related Questions