Reputation: 769
I'm working on some regex right now to isolate bracketed code such as this...
Regex: /\[(.*?)\]/
String: "<strong>[name]</strong>
<a href="http://www.example.com/place/[id]/">For more info...</a>"
Matched Fields: name, id
I'm looking to make this a bit more advanced. What I'm looking to do...
String: "[if:name <strong>[name]</strong>]
<a href="http://www.example.com/place/[id]/">For more info...</a>"
Matched Fields: if:name <strong>[name]</strong>, id
The problem is, I can't figure out any regex that'll work for this. I'm pretty sure I've killed the better half of my day, and I feel like I'm pretty close.
Here's what I've got at the moment that isn't doing what I want...
/\[([^\]]+)\]/
Anyone have any ideas?
Upvotes: 1
Views: 73
Reputation:
This might help if you just want balanced brackets and/or recurse core's for inner brackets. Many nested levels could be done. This is just a framwork for a possible much more complex usage. The balanced text part is actually easier.
# (?:(?>[^\\\[\]]+|(?:\\[\S\s])+)|(?>\[((?:(?&core)|))\]())|([\[\]])())(?:\2|\4)(?(DEFINE)(?<core>(?>[^\\\[\]]++|(?:\\[\S\s])++|\[(?:(?&core)|)\])+))
(?:
(?>
[^\\\[\]]+
|
(?: \\ [\S\s] )+
)
|
(?>
\[
( # (1) core content
(?:
(?&core)
|
)
)
\]
( ) # (2) core flag
)
|
# unbalanced '[' or ']'
( [\[\]] ) # (3) error content
( ) # (4) error flag
)
(?: \2 | \4 ) # only let match if core flag or error flag is set
# this filters search to square brackets only
(?(DEFINE)
# core
(?<core>
(?>
[^\\\[\]]++
|
(?: \\ [\S\s] )++
|
\[
# recurse core
(?:
(?&core)
|
)
\]
)+
)
)
# Perl sample, but regex should be valid in php
# ----------------------------
# use strict;
# use warnings;
#
#
# $/ = "";
#
# my $data = <DATA> ;
#
# parse( $data ) ;
#
#
# sub parse
# {
# my( $str ) = @_;
# while
# (
# $ str =~ /
# (?:(?>[^\\\[\]]+|(?:\\[\S\s])+)|(?>\[((?:(?&core)|))\]())|([\[\]])())(?:\2|\4)(?(DEFINE)(?<core>(?>[^\\\[\]]++|(?:\\[\S\s])++|\[(?:(?&core)|)\])+))
# /xg
# )
#
# {
# if ( defined $1 )
# {
# print "found core \[$1\] \n";
# parse( $1 ) ;
# }
# if ( defined $3 )
# {
# print "unbalanced error '$3' \n";
# }
#
# }
# }
# __DATA__
#
# this [ [ is a test
# [ outter [ inner ] ]
Upvotes: 0
Reputation: 662
Rather than using a Regex for html etc its easier to parse the file. Not sure what language your using so I will give an example of parser in Java. JSoup allows you to access the document using CSS selectors. Makes things so much easier! Take a look through the tutorials etc and see if that makes it easier.
Regex are nice and powerful dont get me wrong but give a parser a try.
Upvotes: 0
Reputation: 71558
PHP supports recursive syntax (like (?R)
), so you can use this regex:
\[((?:[^\[\]]+|(?R))+)\]
The results are: if:name <strong>[name]</strong>
, id
(?R)
is a repeat of the whole regex, hence 'recursive'. The other characters should be easy enough to understand, if not, regex101 provides quite a comprehensive description of the components of the regex :)
Upvotes: 2