Reputation: 463

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

For example, given s = "hithere", dict = ["hi", "there"].

Return true because "hithere" can be segmented as "leet code".

My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:

s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.

I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!

class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        int len = s.size();
        if(len<1) return true;
        for(int i(0); i<len; i++) {
            string tmp = s.substr(0, i+1);
            if((wordDict.find(tmp)!=wordDict.end()) 
               && (wordBreak(s.substr(i+1), wordDict)) )
                return true;
        }
        return false;
    }
};

Upvotes: 2

Answers (4)

marom

Reputation: 5230

Try the following:

class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) 
    {
        for (auto w : wordDict)
        {
            auto pos = s.find(w);
            if (pos != string::npos)
            {
                if (wordBreak(s.substr(0, pos), wordDict) && 
                    wordBreak(s.substr(pos + w.size()), wordDict))
                    return true;
            }
        }
        return false;
    }
};

Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

Upvotes: 0

Sarah

Reputation: 463

Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.

For the current implementation, do you think it can be further improved? Thanks!

class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        int len = s.size();
        if(len<1) return true;
        if(wordDict.size()==0) return false;

        vector<bool> dq (len+1,false);
        dq[0] = true;
        for(int i(0); i<len; i++) {// start point
            if(dq[i]) {
                for(int j(1); j<=len-i; j++) {// length of substring, 1:len
                    if(!dq[i+j]) {
                        auto pos = wordDict.find(s.substr(i, j));
                        dq[i+j] = dq[i+j] || (pos!=wordDict.end());
                    }
                }
            }
            if(dq[len]) return true;
        }
        return false;
    }
};

Upvotes: 0

MSalters

Reputation: 180020

It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.

So you'd get for your example

aa:       {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa:   {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}

This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem

Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.

Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.

Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.

A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.

Upvotes: 1

Petr

Reputation: 9997

In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.

You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.

Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.

Upvotes: 0

Need suggestion to improve speed for word break (dynamic programming)

Answers (4)

Related Questions