Yogi Joshi
Yogi Joshi

Reputation: 816

Data structure choice for dynamic programming

In order to implement a DP algorithm for detecting whether a word is breakable into sub-words of length 'n' to 1, what should be the choice of data structure for caching DP results. Example: the word 'pita' can be broken into 'pit', 'it' & 'i' which are all valid dictionary words as well. The DP algorithm should detect such words.

I can imagine I need start from words of length 1 and build up substrings of increasing lengths but I am not able to arrive at a data structure which can efficiently store the subproblem's results

Upvotes: 1

Views: 653

Answers (4)

Omar Abdelrahman
Omar Abdelrahman

Reputation: 353

I think using unordered_map<string, int> or unordered_set<string> might be very helpful.

Upvotes: 0

Khaled.K
Khaled.K

Reputation: 5940

I can imagine I need start from words of length 1 and build up substrings of increasing lengths but I am not able to arrive at a data structure which can efficiently store the subproblem's results

The following is a pseudo-code written in Java might not compile

TrieNode.java

class TrieNode
{
    public TrieNode (double a, double b, double g, int size)
    {
        this.alpha = a;
        this.beta = b;
        this.gamma = g;
        this.next = new TrieNode [size];
    }
    public double alpha, beta, gamma;
    public TrieNode [] next;
}

You can replace alpha, beta, gamma with your parameters for the problem.

Trie.java

class Trie
{

    private TrieNode root;
    private char base;
    private final int ALPHABET_SIZE;

    public Trie (int ALPHABET_SIZE, char base_char)
    {
        this.base = base_char;
        this.root=new TrieNode(0,0,0,ALPHABET_SIZE);
    }

    public boolean insert (String word, double [] param)
    {
        TrieNode current = this.root;
        int i = (-1);

        if (!validate(word))
            return false;

        char [] c = word.toCharArray();

        for (int i = 0; i < (c.length-1); i++)
        {
            if (current.next[c[i]-base] == null)
                current.next[c[i]-base] = new TrieNode (0,0,0,ALPHABET_SIZE);
            current = current.next[c[i]-base];
        }

        if (current.next[c[i]-base] == null)
            current.next[c[i]-base] = new TrieNode (0,0,0,ALPHABET_SIZE);

        current.next[c[i]-base].alpha = param[0];
        current.next[c[i]-base].beta = param[1];
        current.next[c[i]-base].gamma = param[2];
        return true;
    }

    public boolean validate (String word)
    {
        for (char c : word.toCharArray())
            if (c < base || c > (base+ALPHABET_SIZE-1))
                return false;
    }

}

This is an index-based Trie, with safe insert function.

MappedTrie.java

class MappedTrie
{
    public final char [] ALPHABET;
    private Trie trie;

    MappedTrie (char [] alphabet)
    {
        ALPHABET = alphabet;
        trie = new Trie (ALPHABET.length,0);
    }

    public boolean insert (String word, double [] param)
    {
        if (!validate(word))
            return false;

        String key = "";

        for (char c : word.toCharArray())
            key += encode(c);

        return trie.insert(key,param);

    }

    private char encode (char c)
    {
        for (int i=0; i<ALPHABET.length; i++)
            if (c == ALPHABET[i])
                return i;
    }

    private char decode (char d)
    {
        return ALPHABET[d];
    }

    public boolean validate (String word)
    {
        boolean exist = false;

        for (char c : word.toCharArray())
        {
            exist = false;
            for (char d : ALPHABET) if (c == d) { exist = true; break; }
            if (!exist) return false;
        }

        return true;
    }

}

In case of a custom alphabet where characters are not adjecent, we can map it over index.

Test

class Test
{
    public static void main (String [] args)
    {
        // 'a'+0, 'a'+1, 'a'+2
        MappedTrie k = new MappedTrie(3, 'a');

        k.insert("aa", { 0.1, 0.2, 0.3 });
        k.insert("ab", { 0.4, 0.5, 0.6 });
        k.insert("ac", { 0.7, 0.8, 0.9 });

        k.insert("ba", { 0.01, 0.02, 0.03 });
        k.insert("bb", { 0.04, 0.05, 0.06 });
        k.insert("bc", { 0.07, 0.08, 0.09 });

        k.insert("ca", { 0.001, 0.002, 0.003 });
        k.insert("cb", { 0.004, 0.005, 0.006 });
        k.insert("cc", { 0.007, 0.008, 0.009 });

        // 'a' => 0, 'b' => 1, 'c' => 2
        MappedTrie t = new MappedTrie({'a','b','c'});

        t.insert("aa", { 0.1, 0.2, 0.3 });
        t.insert("ab", { 0.4, 0.5, 0.6 });
        t.insert("ac", { 0.7, 0.8, 0.9 });

        t.insert("ba", { 0.01, 0.02, 0.03 });
        t.insert("bb", { 0.04, 0.05, 0.06 });
        t.insert("bc", { 0.07, 0.08, 0.09 });

        t.insert("ca", { 0.001, 0.002, 0.003 });
        t.insert("cb", { 0.004, 0.005, 0.006 });
        t.insert("cc", { 0.007, 0.008, 0.009 });
    }
}

Upvotes: 0

Vikram Bhat
Vikram Bhat

Reputation: 6246

There is no need for special datastructure as such a boolean DP array would be sufficient :-

isSentence[j] tell the substring str[j to n] is valid concatenation of english words/word where n is last index

isSentence[j] = or(isword(str[j to i]) && isSentece[i+1]) for all i from j to n

or here means logical OR of all subproblem and isword is dictionary lookup which returns boolean value and && logical AND

Upvotes: 1

mohamed sulibi
mohamed sulibi

Reputation: 526

if you work on your word sequential you don't need to track previous word as you don't reach them again, but you can use a cache to cache previously lookup sub word in the dictionary. If i understand the problem, then i think the following Java code may help you:

String term = "pita";
Set<String> dictionary = new HashSet<String>();
boolean ans = true;
for(int i = 0; i < term.length() && ans == true; i++) {
    for(int j = i + 1; j < term.length(); j++) {
        String subTerm = term.substring(i, j);
        if(!dictionary.contains(subTerm)){
            ans = false;
            break;
        }
    }
}
System.out.println("ans [" = ans + "]");

For dictionary you can use a hash table and it support to take O(1) to check if the sub word exist or not. if you cache previously checked sub word it will take the same O(1)

I think this problem is fitted in sorting and searching technique not in DP because it did not use a previous answer to produce current answer.

Upvotes: 1

Related Questions