Reputation:
I'm trying to create a regex that will reveal two sub strings within the given string and the returned value should be an array containing two elements, the two matched strings. I understand that my problem is closely linked to palindrome's which cannot be implemented as regex, but I'm hoping there is regex that will get close enough as there is a finite size structure I'm expecting to read.
To be very specific, I only care about matching the two top-level children as in the first example, any number of nested brackets inside don't matter at all, whether there is 1 or 99999 of them.
Note that spacing is just for easier readability and the input string will have no spaces. This structure is simply:
{ }{ }
and should be accepted as two strings:
{ } and { }
Contained within this, there can be any number of groupings of braces:
{ {} {} {} {} {} {} }{ {} }
and should be accepted as two strings:
{ {} {} {} {} {} {} } and { {} }
Contained within any of these inner groupings of braces can just be infinite recursive groupings like:
{{{{ }{{ }}{ }}}}{{ }{ }{ }}
and should be accepted as two strings:
{{{{ }{{ }}{ }}}} and {{ }{ }{ }}
I've thought of this problem for quite a while by myself and couldn't come up with a proper solution and there aren't any tools online I've found that have a visual way to see these two substrings, it always just matches the whole string. I've also used some regex creators like "http://regex.inginf.units.it/" and gave it maximum number of strings and all possible edge cases, etc, but only got like 40% accuracy. I'm hoping someone smarter than me on the subject that can come up with a regex to fit the answers to the bottom 7 examples and any other possible string constructed from the rules above.
I made a simple html to test my strings (just edit the "reg" variable in script tag to change your regex and view results with refreshing page:
var reg = /({({.*})*})/g;
var str1 = "{}{}";
var str2 = "{{}{}}{{}}";
var str3 = "{{{{{}{}{}{}}{{}}}}{}}{}";
var str4 = "{{{{{{{{{{{{{{{{{}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}";
var str5 = "{{}{{{{{{}{}}}}{{{{}{}}}{}}}}{}{{{}{{}}}}}{{{{{}}{{{{}{}}}}}}{{{{}}{{{{}{}}}}}}}";
var str6 = "{{}{}}{{}{{{}{}}}}";
var str7 = "{{}{}}{{{{{}}{{}}}}{{{}{}}}}";
var s1 = document.getElementById("d1").innerHTML = str1.match(reg);
var s2 = document.getElementById("d2").innerHTML = str2.match(reg);
var s3 = document.getElementById("d3").innerHTML = str3.match(reg);
var s4 = document.getElementById("d4").innerHTML = str4.match(reg);
var s5 = document.getElementById("d5").innerHTML = str5.match(reg);
var s6 = document.getElementById("d6").innerHTML = str6.match(reg);
var s7 = document.getElementById("d7").innerHTML = str7.match(reg);
<p id="d1"></p>
<p id="ans1">{},{}</p>
<p id="d2"></p>
<p id="ans2">{{}{}},{{}}</p>
<p id="d3"></p>
<p id="ans3">{{{{{}{}{}{}}{{}}}}{}},{}</p>
<p id="d4"></p>
<p id="ans4">{{{{{{{{{{{{{{{{{}}{{}}}}}{{}}}}}{{}}}}}{{}}}}}{{}}}}},{{}}</p>
<p id="d5"></p>
<p id="ans5">{{}{{{{{{}{}}}}{{{{}{}}}{}}}}{}{{{}{{}}}}},{{{{{}}{{{{}{}}}}}}{{{{}}{{{{}{}}}}}}}</p>
<p id="d6"></p>
<p id="ans6">{{}{}},{{}{{{}{}}}}</p>
<p id="d7"></p>
<p id="ans7">{{}{}},{{{{{}}{{}}}}{{{}{}}}}</p>
Upvotes: 1
Views: 124
Reputation: 270890
Regex is not suitable for this task (at least the JS flavour isn’t). Anything that involves structures that can be arbitrarily nested is not suitable to be matched with regex. This is why they say you should not use regex to parse HTML or JSON. See this answer for more info.
The string you have here is quite simple to parse without using regex. By using regex you are kind of making life hard for yourself.
Here's how to parse this string (assuming the brackets are always balanced):
Upvotes: 3