Reputation: 3
I am trying to generate regular expressions based on some known texts. I assumed I could load the text into a tree structure and see what kind of a tree is generated but I have 1 issue which I can't seem to figure out: I want cousins to be joined together.
For Example:
ZABCDEF
ZBCCEFG
would result in:
A-B- D-E-F Z- C- B-C- E-F-G
I don't want any sorting done as the goal is to match the text as is. Any hints would be most appreciated.
Upvotes: 0
Views: 654
Reputation: 1691
Would be easier to know what type of regular expression you wanted to make from that resulting tree you wrote, but I think a tree is a bit more than you would need for this.
Assuming you want the values that are the same to be the anchors of the regular expression, then all you would need to do is track when the characters at a specific index in the strings are the same. This could be tracked with a few data types but the easiest one to explain would just be an array of boolean values (If all strings are not the same length then you would want the length of the boolean array to be the size of the second largest string, not the largest (Nothing will ever match its trailing characters). By default they are initialized to false, you could then loop through the given strings and when all the characters at an index are the same set the boolean value to true.
Then to build up your regular expression using the characters that are the same across all strings you could check the boolean array to see if you can place a value directly into the expression or if you need to handle a choice between the different strings... Please note that this processing could also just be done inlined, there is no real need to track the data and process the strings a second time.
If this is in the right direction, or if you can provide more information to send us in the right direction, I could come back and write a quick bit of code.
edit: Just a bit of code to explain what I was saying
string s1 = "ZABCDEF";
string s2 = "ZBCCEFG";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s1.Length; ++i)
{
if (s1[i] == s2[i])
{
sb.Append(s1[i]);
Console.WriteLine(" " + s1[i]);
}
else
{
sb.Append("[" + s1[i] + s2[i] + "]");
Console.WriteLine(s1[i] + " " + s2[i]);
}
}
Console.WriteLine(sb);
Outputs you diagram vertically as well as the resulting expression that would match either string.
Z
A B
B C
C
D E
E F
F G
Z[AB][BC]C[DE][EF][FG]
Upvotes: 1