Reputation: 816
In order to implement a DP algorithm for detecting whether a word is breakable into sub-words of length 'n' to 1, what should be the choice of data structure for caching DP results. Example: the word 'pita' can be broken into 'pit', 'it' & 'i' which are all valid dictionary words as well. The DP algorithm should detect such words.
I can imagine I need start from words of length 1 and build up substrings of increasing lengths but I am not able to arrive at a data structure which can efficiently store the subproblem's results
Upvotes: 1
Views: 653
Reputation: 353
I think using unordered_map<string, int>
or unordered_set<string>
might be very helpful.
Upvotes: 0
Reputation: 5940
I can imagine I need start from words of length 1 and build up substrings of increasing lengths but I am not able to arrive at a data structure which can efficiently store the subproblem's results
The following is a pseudo-code written in Java might not compile
TrieNode.java
class TrieNode
{
public TrieNode (double a, double b, double g, int size)
{
this.alpha = a;
this.beta = b;
this.gamma = g;
this.next = new TrieNode [size];
}
public double alpha, beta, gamma;
public TrieNode [] next;
}
You can replace alpha
, beta
, gamma
with your parameters for the problem.
Trie.java
class Trie
{
private TrieNode root;
private char base;
private final int ALPHABET_SIZE;
public Trie (int ALPHABET_SIZE, char base_char)
{
this.base = base_char;
this.root=new TrieNode(0,0,0,ALPHABET_SIZE);
}
public boolean insert (String word, double [] param)
{
TrieNode current = this.root;
int i = (-1);
if (!validate(word))
return false;
char [] c = word.toCharArray();
for (int i = 0; i < (c.length-1); i++)
{
if (current.next[c[i]-base] == null)
current.next[c[i]-base] = new TrieNode (0,0,0,ALPHABET_SIZE);
current = current.next[c[i]-base];
}
if (current.next[c[i]-base] == null)
current.next[c[i]-base] = new TrieNode (0,0,0,ALPHABET_SIZE);
current.next[c[i]-base].alpha = param[0];
current.next[c[i]-base].beta = param[1];
current.next[c[i]-base].gamma = param[2];
return true;
}
public boolean validate (String word)
{
for (char c : word.toCharArray())
if (c < base || c > (base+ALPHABET_SIZE-1))
return false;
}
}
This is an index-based Trie, with safe insert function.
MappedTrie.java
class MappedTrie
{
public final char [] ALPHABET;
private Trie trie;
MappedTrie (char [] alphabet)
{
ALPHABET = alphabet;
trie = new Trie (ALPHABET.length,0);
}
public boolean insert (String word, double [] param)
{
if (!validate(word))
return false;
String key = "";
for (char c : word.toCharArray())
key += encode(c);
return trie.insert(key,param);
}
private char encode (char c)
{
for (int i=0; i<ALPHABET.length; i++)
if (c == ALPHABET[i])
return i;
}
private char decode (char d)
{
return ALPHABET[d];
}
public boolean validate (String word)
{
boolean exist = false;
for (char c : word.toCharArray())
{
exist = false;
for (char d : ALPHABET) if (c == d) { exist = true; break; }
if (!exist) return false;
}
return true;
}
}
In case of a custom alphabet where characters are not adjecent, we can map it over index.
Test
class Test
{
public static void main (String [] args)
{
// 'a'+0, 'a'+1, 'a'+2
MappedTrie k = new MappedTrie(3, 'a');
k.insert("aa", { 0.1, 0.2, 0.3 });
k.insert("ab", { 0.4, 0.5, 0.6 });
k.insert("ac", { 0.7, 0.8, 0.9 });
k.insert("ba", { 0.01, 0.02, 0.03 });
k.insert("bb", { 0.04, 0.05, 0.06 });
k.insert("bc", { 0.07, 0.08, 0.09 });
k.insert("ca", { 0.001, 0.002, 0.003 });
k.insert("cb", { 0.004, 0.005, 0.006 });
k.insert("cc", { 0.007, 0.008, 0.009 });
// 'a' => 0, 'b' => 1, 'c' => 2
MappedTrie t = new MappedTrie({'a','b','c'});
t.insert("aa", { 0.1, 0.2, 0.3 });
t.insert("ab", { 0.4, 0.5, 0.6 });
t.insert("ac", { 0.7, 0.8, 0.9 });
t.insert("ba", { 0.01, 0.02, 0.03 });
t.insert("bb", { 0.04, 0.05, 0.06 });
t.insert("bc", { 0.07, 0.08, 0.09 });
t.insert("ca", { 0.001, 0.002, 0.003 });
t.insert("cb", { 0.004, 0.005, 0.006 });
t.insert("cc", { 0.007, 0.008, 0.009 });
}
}
Upvotes: 0
Reputation: 6246
There is no need for special datastructure as such a boolean DP array would be sufficient :-
isSentence[j] tell the substring str[j to n] is valid concatenation of english words/word where n is last index
isSentence[j] = or(isword(str[j to i]) && isSentece[i+1]) for all i from j to n
or here means logical OR of all subproblem and isword is dictionary lookup which returns boolean value and && logical AND
Upvotes: 1
Reputation: 526
if you work on your word sequential you don't need to track previous word as you don't reach them again, but you can use a cache to cache previously lookup sub word in the dictionary. If i understand the problem, then i think the following Java code may help you:
String term = "pita";
Set<String> dictionary = new HashSet<String>();
boolean ans = true;
for(int i = 0; i < term.length() && ans == true; i++) {
for(int j = i + 1; j < term.length(); j++) {
String subTerm = term.substring(i, j);
if(!dictionary.contains(subTerm)){
ans = false;
break;
}
}
}
System.out.println("ans [" = ans + "]");
For dictionary you can use a hash table and it support to take O(1) to check if the sub word exist or not. if you cache previously checked sub word it will take the same O(1)
I think this problem is fitted in sorting and searching technique not in DP because it did not use a previous answer to produce current answer.
Upvotes: 1