Reputation: 3028
I am writing a function which can have two potential forms of input:
This is {a {string}}
This {is} a {string}
I call the sub-strings wrapped in curly-brackets "tags". I could potentially have any number of tags in a string, and they could be nested arbitrarily deep.
I've tried writing a regular expression to grab the tags, which of course fails on the nested tags, grabbing {a {string}
, missing the second curly bracket. I can see it as a recursive problem, but after staring at the wrong answer too long I feel like I'm blind to seeing something really obvious.
What can I do to separate out the potential tags into parts so that they can be processed and replaced?
The More Complicated Version
def parseTags( oBody, szText )
if szText.match(/\{(.*)\}/)
szText.scan(/\{(.*)\}/) do |outers|
outers.each do |blah|
if blah.match(/(.*)\}(.*)\{(.*)/)
blah.scan(/(.*)\}(.*)\{(.*)/) do |inners|
inners.each do |tags|
szText = szText.sub("\{#{tags}\}", parseTags( oBody, tags ))
end
end
else
szText = szText.sub("\{#{blah}\}", parseTags( oBody, blah ))
end
end
end
end
if szText.match(/(\w+)\.(\w+)(?:\.([A-Za-z0-9.\[\]": ]*))/)
func = $1+"_"+$2
begin
szSub = self.send func, oBody, $3
rescue Exception=>e
szSub = "{Error: Function #{$1}_#{$2} not found}"
$stdout.puts "DynamicIO Error Encountered: #{e}"
end
szText = szText.sub("#{$1}.#{$2}#{$3!=nil ? "."+$3 : ""}", szSub)
end
return szText
end
This was the result of tinkering too long. It's not clean, but it did work for a case similar to "1" - {help.divider.red.sys.["{pc.login}"]}
is replaced with ---------------[ Duwnel ]---------------
. However, {pc.attr.str.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.pre.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.int.dotmode}
implodes brilliantly, with random streaks of red and swatches of missing text.
To explain, anything marked {ansi.col.red}
marks an ansi red code, reset escapes the color block, and {pc.attr.XXX.dotmode}
displays a number between 1 and 10 in "o"s.
Upvotes: 0
Views: 104
Reputation: 4132
As others have noted, this is a perfect case for a parsing engine. Regular expressions don't tend to handle nested pairs well.
Treetop is an awesome PEG parser that you might be interested in taking a look at. The main idea is that you define everything that you want to parse (including whitespace) inside rules. The rules allow you to recursively parse things like bracket pairs.
Here's an example grammar for creating arrays of strings from nested bracket pairs. Usually grammars are defined in a separate file, but for simplicity I included the grammar at the end and loaded it with Ruby's DATA
constant.
require 'treetop'
Treetop.load_from_string DATA.read
parser = BracketParser.new
p parser.parse('This is {a {string}}').value
#=> ["This is ", ["a ", ["string"]]]
p parser.parse('This {is} a {string}').value
#=> ["This ", ["is"], " a ", ["string"]]
__END__
grammar Bracket
rule string
(brackets / not_brackets)+
{
def value
elements.map{|e| e.value }
end
}
end
rule brackets
'{' string '}'
{
def value
elements[1].value
end
}
end
rule not_brackets
[^{}]+
{
def value
text_value
end
}
end
end
Upvotes: 2
Reputation: 27207
I would recommend instead of fitting more complex regular expressions to this problem, that you look into one of Ruby's grammar-based parsing engines. It is possible to design recursive and nested grammars in most of these.
parslet
might be a good place to start for your problem. The erb-alike example, although it does not demonstrate nesting, might be closest to your needs: https://github.com/kschiess/parslet/blob/master/example/erb.rb
Upvotes: 1