Reputation: 2137
For example, I have a string :
/div1/div2[/div3[/div4]]/div5/div6[/div7]
Now I want to split the content by "/
" and ignore the content in the "[ ]
".
The result should be:
div1
div2[/div3[/div4]]
div5
div6[/div7]
How can I get the result using regular expression? My programming language is JavaScript.
Upvotes: 4
Views: 623
Reputation: 81
Without knowing which regex engine you are targeting i can only guess what would work for you. If you are using .Net, have a look here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
If you're using perl, have a look here: http://metacpan.org/pod/Regexp::Common::balanced
Upvotes: 0
Reputation: 75232
Judging by your posting history, I'll guess you're talking about C# (.NET) regexes. In that case, this should work:
Regex.Split(target, @"(?<!\[)/");
This assumes every non-delimiter /
is immediately preceded by a left square bracket, as in your sample data.
You should always specify which regex flavor you're working with. This technique, for example, requires a flavor that supports lookbehinds. Off the top of my head, that includes Perl, PHP, Python and Java, but not JavaScript.
EDIT: Here's a demonstration in Java:
public class Test
{
public static void main(String[] args)
{
String str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";
String[] parts = str.split("(?<!\\[)/");
for (String s : parts)
{
System.out.println(s);
}
}
}
output:
div1
div2[/div3[/div4]]
div5
div6[/div7]
Of course, I'm relying on some simplifying assumptions here. I trust you'll let me know if any of my assumptions are wrong, Mike. :)
EDIT: Still waiting on a ruling from Mike about the assumptions, but Chris Lutz brought up a good point in his comment to 280Z28. At the root level in the sample string, there are two places where you see two contiguous /divN
tokens, but at every other level the tokens are always isolated from each other by square brackets. My solution, like 280Z28's, assumes that will always be true, but what if the data looked like this?
/div1/div2[/div3/div8[/div4]/div9]/div5/div6[/div7]
Now we've got two places where a non-delimiter slash is not preceded by a left square bracket, but the basic idea is. Starting from any point the root level, if you scan forward looking for square brackets, the first one you find will always be a left (or opening) bracket. If you scan backward, you'll always find a right (or closing) bracket first. If both of those conditions are not true, you're not at the root level. Translating that to lookarounds, you get this:
/(?![^\[\]]*\])(?<!\[[^\[\]]*)
I know it's getting pretty gnarly, but I'll this take over that godawful recursion stuff any day of the week. ;) Another nice thing is that you don't have to know anything about the tokens except that they start with slashes and don't contain any square brackets. By the way, this regex contains a lookbehind that can match any number of characters; the list of regex flavors that support that is very short indeed, but .NET can do it.
Upvotes: 1
Reputation: 14031
This works...
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string testCase = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";
//string pattern = "(?<Match>/div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";
string pattern = "(?<Match>div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";
Regex rx = new Regex(pattern);
MatchCollection matches = rx.Matches(testCase);
foreach (Match match in matches)
Console.WriteLine(match.Value);
Console.ReadLine();
}
}
Courtesy of... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/
Upvotes: 2
Reputation: 342393
experimental example, using PHP and split approach, but only tested on sample string.
$str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]/div8";
// split on "/"
$s = explode("/",$str);
foreach ($s as $k=>$v){
// if no [ or ] in the item
if( strpos($v,"[")===FALSE && strpos($v,"]") ===FALSE){
print "\n";
print $v."\n";
}else{
print $v . "/";
}
}
output:
div1
div2[/div3[/div4]]/
div5
div6[/div7]/
div8
Note: there is "/" at the end so just a bit of trimming will get desired result.
Upvotes: 0
Reputation: 99889
You can't do this with regular expressions because it's recursive. (That answers your question, now to see if I can solve the problem elegantly...)
Edit: aem tipped me off! :D
Works as long as every [
is followed by /
. It does not verify that the string is in the correct format.
string temp = text.Replace("[/", "[");
string[] elements = temp.Split('/').Select(element => element.Replace("[", "[/")).ToArray();
Upvotes: 3
Reputation: 3916
You can first translate the two-character sequence [/ into another character or sequence that you know won't appear in the input, then split the string on / boundaries, then re-translate the translated sequence back into [/ in the result strings. This doesn't even require regular expressions. :)
For instance, if you know that [ won't appear on its own in your input sequences, you could replace [/ with [ in the initial step.
Upvotes: 2