Reputation: 3736
I need to identify the common prefix among a set of strings. I'm looking for the most efficient method, avoiding character-by-character comparisons.
For example, consider the following set of strings:
/home/texai/www/app/application/cron/logCron.log
/home/texai/www/app/application/jobs/logCron.log
/home/texai/www/app/var/log/application.log
/home/texai/www/app/public/imagick.log
/home/texai/www/app/public/status.log
The expected common prefix in this case is:
/home/texai/www/app/
Your insights and suggestions would be greatly appreciated.
Upvotes: 3
Views: 1541
Reputation: 1130
A trie of characters. A tree-like data structure used to store a dynamic set of strings where each node represents a single character.
Time Efficiency: Search, Insertion & Deletion: With a time complexity of O(m) where m is the length of the string.
Space Efficiency: Store the unique characters present in the strings. This can be a space advantage compared to storing the entire strings.
Example:
#include <unordered_map>
#include <string_view> // (Since C++17)
#include <memory>
#include <iostream>
// A trie of characters.
class Trie final
{
public:
// Function to insert a string into the trie.
void Insert(const std::string_view string_view)
{
if (!string_view.empty()) {
Insert(root_, string_view);
}
}
// Function to find the common prefix among trie strings.
auto CommonPrefix() const
{
std::string common_prefix{};
CommonPrefix(root_, common_prefix);
return common_prefix;
}
private:
// Trie__
// Structure representing a node in the trie.
struct Node final
{
std::unordered_map<char, std::unique_ptr<Node>> child_node{}; // Map of child nodes.
bool end_of_string{ false }; // Flag to indicate end of a string
};
Node root_{}; // Root node of the trie (Represents a null prefix.)
// __Trie
// Recursive function to insert a string into the trie.
void Insert(Node& root, const std::string_view string_view)
{
if (string_view.empty()) {
root.end_of_string = true; // Mark the end of the string
return;
}
char first_char{ string_view[0] }; // Get the first character of the string.
std::string_view remaining_view{ string_view.substr(1) }; // Get the substring starting from index 1.
if (root.child_node.find(first_char) == root.child_node.end()) {
// If the character does not exist in the trie, create a new node and insert it into the child nodes map.
root.child_node[first_char] = std::make_unique<Node>();
}
Insert(*root.child_node[first_char], remaining_view); // Recursively insert the remaining substring.
}
// Recursive function to find the common prefix among all inserted strings.
void CommonPrefix(const Node& root, std::string& common_prefix) const
{
// Check if there's only one child node and it's not the end of a string
if (root.child_node.size() != 1 || root.end_of_string) {
return;
}
common_prefix += root.child_node.begin()->first; // Append the current character to the common prefix.
CommonPrefix(*root.child_node.begin()->second, common_prefix); // Recursively find the common prefix in the next node.
}
};
int main()
{
// Create a trie object.
Trie trie{};
// Insert strings into the trie.
trie.Insert("/home/texai/www/app/application/cron/logCron.log");
trie.Insert("/home/texai/www/app/application/jobs/logCron.log");
trie.Insert("/home/texai/www/app/var/log/application.log");
trie.Insert("/home/texai/www/app/public/imagick.log");
trie.Insert("/home/texai/www/app/public/status.log");
// Find and print the common prefix among all inserted strings.
std::cout << "common prefix: " << trie.CommonPrefix() << std::endl;
}
Output:
common prefix: /home/texai/www/app/
--
PATRICIA trie. It can be even more efficient with PATRICIA (Practical Algorithm To Retrieve Information Coded In Alphanumeric), as @greybeared drew my attention to. You can read about it here: https://en.m.wikipedia.org/wiki/Radix_tree#PATRICIA
Upvotes: 2
Reputation: 93030
I'm not sure what you mean by avoid char by char comparative, but you at least need to read the common prefix from each of the strings, so the following algorithm is the best you can achieve (just iterate over the strings until they deviate or until the current longest prefix count is reached):
List<string> list = new List<string>()
{
"/home/texai/www/app/application/cron/logCron.log",
"/home/texai/www/app/application/jobs/logCron.log",
"/home/texai/www/app/var/log/application.log",
"/home/texai/www/app/public/imagick.log",
"/home/texai/www/app/public/status.log"
};
int maxPrefix = list[0].Length;
for(int i = 1; i < list.Count; i++)
{
int pos = 0;
for(; pos < maxPrefix && pos < list[i].Length && list[0][pos] == list[i][pos]; pos++);
maxPrefix = pos;
}
//this is the common prefix
string prefix = list[0].Substring(0, maxPrefix);
Upvotes: 1
Reputation: 56809
You cannot avoid going through at least the common parts to find common prefix.
I don't think this needs any fancy algorithm. Just keep track of the current common prefix, then shorten the prefix by comparing the current prefix with the next string.
Since this is common prefix of all strings, you may end up with empty string (no common prefix).
Upvotes: 2