Reputation: 3773
Problem:
Given a list of strings, find the substring which, if subtracted from the beginning of all strings where it matches and replaced by an escape byte, gives the shortest total length.
Example:
"foo"
, "fool"
, "bar"
The result is: "foo" as the base string with the strings "\0"
, "\0l"
, "bar"
and a total length of 9 bytes. "\0"
is the escape byte. The sum of the length of the original strings is 10, so in this case we only saved one byte.
A naive algorithm would look like:
for string in list
for i = 1, i < length of string
calculate total length based on prefix of string[0..i]
if better than last best, save it
return the best prefix
That will give us the answer, but it's something like O((n*m)^2), which is too expensive.
Upvotes: 2
Views: 983
Reputation: 47934
Use a forest of prefix trees (trie)...
f_2 b_1
/ |
o_2 a_1
| |
o_2 r_1
|
l_1
then, we can find the best result, and guarantee it, by maximizing (depth * frequency)
which will be replaced with your escape character. You can optimize the search by doing a branch and bound depth first search for the maximum.
On the complexity: O(C), as mentioned in comment, for building it, and for finding the optimal, it depends. If you order the first elements frequency (O(A) --where A is the size of the languages alphabet), then you'll be able to cut out more branches, and have a good chance of getting sub-linear time.
I think this is clear, I am not going to write it up --what is this a homework assignment? ;)
Upvotes: 7
Reputation: 103515
Well, first step would be to sort the list. Then one pass through the list, comparing each element with the previous, keeping track of the longest 2-character, 3-character, 4-character etc runs. Then figure is the 20 3-character prefixes better than the 15 4-character prefixes.
Upvotes: 1
Reputation: 37740
I would try starting by sorting the list. Then you simply go from string to string comparing the first character to the next string's first char. Once you have a match you would look at the next char. You would need to devise a way to track the best result so far.
Upvotes: 1