Reputation: 802
Suppose I have a collection of substrings, for example:
string a = {"cat","sensitive","ate","energy","tense"}
Then the output for this should be as follows:
catensesensitivenergy
How can I do this?
Upvotes: 6
Views: 502
Reputation: 6086
Break the problem down and see what we got. Starting with only two strings. We must check which suffix of one string is the longest prefix of the other. This gives us the order for the best concatenation.
Now, with a set of n word, how do we do ? We start by building a trie containing every word (a key for each word). If a word is a duplicate of an other we can easily flag it as such while building the prefix tree.
I made a quick implementation of a regular Trie. You can find it here.
We have the tools to build a directed graph linking the different words wether a suffix of the first is a prefix of the second. The weight of the edge is the length of the suffix.
To do so, for each word w of the input set, we must see which words we can reach with a suffix of w :
From there, the graph is set and we must find the shortest path that runs through every vertex (eg. word) only once. If the graph is complete, this is the Traveling Salesman Problem. Most of the times, the graph won't be complete.
Still, it remains a NP-hard problem. In more "technical" terms, the problem at hand is to find the shortest hamiltonian path of a digraph.
Note : Given an Hamiltonian path (if it exists) with its cost C, and its starting vertex (word) W, the supestring length is given by :
Lsuper = LW + C
Note : If two words have no suffix linking them to another word, then the graph is not connected and there is no hamiltonian path.
Upvotes: 1
Reputation: 18566
This problem is known as the shortest common superstring problem and it is NP-hard, so if you need an exact solution you cannot do much better then trying all possibilities and choosing the best one.
One possible exponential solution is to generate all permutations of the input strings, find the shortest common superstring greedily for each permutation(a permutation specifies the order of strings and it is possible to prove that for a fixed order greedy algorithm always works correctly) and choose the best one.
Upvotes: 5
Reputation: 1729
Using user2040251 suggestion:
#include <string>
#include <iostream>
#include <algorithm>
std::string merge_strings( const std::vector< std::string > & pool )
{
std::string retval;
for( auto s : pool )
if( retval.empty() )
retval.append( s );
else if( std::search( retval.begin(), retval.end(), s.begin(), s.end() ) == retval.end() )
{
size_t len = std::min( retval.size(), s.size() );
for( ; len; --len )
if( retval.substr( retval.size() - len ) == s.substr( 0, len ) )
{
retval.append( s.substr( len ) );
break;
}
if( !len )
retval.append( s );
}
return retval;
}
std::string shortest_common_supersequence( std::vector< std::string > & pool )
{
std::sort( pool.begin(), pool.end() );
std::string buffer;
std::string best_reduction = merge_strings( pool );
while( std::next_permutation( pool.begin(), pool.end() ) )
{
buffer = merge_strings( pool );
if( buffer.size() < best_reduction.size() )
best_reduction = buffer;
}
return best_reduction;
}
int main( int argc, char ** argv )
{
std::vector< std::string > a{"cat","sensitive","ate","energy","tense"};
std::vector< std::string > b{"cat","sensitive","ate","energy","tense","sit"};
std::vector< std::string > c{"personal","ate","energy","tense","gyroscope"};
std::cout << "best a --> \"" << shortest_common_supersequence( a ) << "\"\n";
std::cout << "best b --> \"" << shortest_common_supersequence( b ) << "\"\n";
std::cout << "best c --> \"" << shortest_common_supersequence( c ) << "\"\n";
return 0;
}
Output:
best a --> "catensensitivenergy"
best b --> "catensensitivenergy"
best c --> "atensenergyroscopersonal"
Upvotes: 3