stslavik
stslavik

Reputation: 3028

How do I break up a string around "{tags}"?

I am writing a function which can have two potential forms of input:

  1. This is {a {string}}
  2. This {is} a {string}

I call the sub-strings wrapped in curly-brackets "tags". I could potentially have any number of tags in a string, and they could be nested arbitrarily deep.

I've tried writing a regular expression to grab the tags, which of course fails on the nested tags, grabbing {a {string}, missing the second curly bracket. I can see it as a recursive problem, but after staring at the wrong answer too long I feel like I'm blind to seeing something really obvious.

What can I do to separate out the potential tags into parts so that they can be processed and replaced?

The More Complicated Version

def parseTags( oBody, szText )


  if szText.match(/\{(.*)\}/)
    szText.scan(/\{(.*)\}/) do |outers|
      outers.each do |blah|
        if blah.match(/(.*)\}(.*)\{(.*)/)
          blah.scan(/(.*)\}(.*)\{(.*)/) do |inners|
            inners.each do |tags|
              szText = szText.sub("\{#{tags}\}", parseTags( oBody, tags ))
            end
          end
        else
          szText = szText.sub("\{#{blah}\}", parseTags( oBody, blah ))
        end
      end
    end
  end
  if szText.match(/(\w+)\.(\w+)(?:\.([A-Za-z0-9.\[\]": ]*))/)
    func = $1+"_"+$2
    begin
      szSub = self.send func, oBody, $3
    rescue Exception=>e
      szSub = "{Error: Function #{$1}_#{$2} not found}"
      $stdout.puts "DynamicIO Error Encountered: #{e}"
    end
    szText = szText.sub("#{$1}.#{$2}#{$3!=nil ? "."+$3 : ""}", szSub)
  end
  return szText
end

This was the result of tinkering too long. It's not clean, but it did work for a case similar to "1" - {help.divider.red.sys.["{pc.login}"]} is replaced with ---------------[ Duwnel ]---------------. However, {pc.attr.str.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.pre.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.int.dotmode} implodes brilliantly, with random streaks of red and swatches of missing text.

To explain, anything marked {ansi.col.red} marks an ansi red code, reset escapes the color block, and {pc.attr.XXX.dotmode} displays a number between 1 and 10 in "o"s.

Upvotes: 0

Views: 104

Answers (2)

Josh Voigts
Josh Voigts

Reputation: 4132

As others have noted, this is a perfect case for a parsing engine. Regular expressions don't tend to handle nested pairs well.

Treetop is an awesome PEG parser that you might be interested in taking a look at. The main idea is that you define everything that you want to parse (including whitespace) inside rules. The rules allow you to recursively parse things like bracket pairs.

Here's an example grammar for creating arrays of strings from nested bracket pairs. Usually grammars are defined in a separate file, but for simplicity I included the grammar at the end and loaded it with Ruby's DATA constant.

require 'treetop'

Treetop.load_from_string DATA.read

parser = BracketParser.new

p parser.parse('This is {a {string}}').value

#=> ["This is ", ["a ", ["string"]]]

p parser.parse('This {is} a {string}').value

#=> ["This ", ["is"], " a ", ["string"]]

__END__
grammar Bracket
   rule string
      (brackets / not_brackets)+
      {
         def value
            elements.map{|e| e.value }
         end
      }
   end

   rule brackets
      '{' string '}'
      {
         def value
            elements[1].value
         end
      }
   end

   rule not_brackets
      [^{}]+
      {
         def value
            text_value
         end
      }
   end
end

Upvotes: 2

Neil Slater
Neil Slater

Reputation: 27207

I would recommend instead of fitting more complex regular expressions to this problem, that you look into one of Ruby's grammar-based parsing engines. It is possible to design recursive and nested grammars in most of these.

parslet might be a good place to start for your problem. The erb-alike example, although it does not demonstrate nesting, might be closest to your needs: https://github.com/kschiess/parslet/blob/master/example/erb.rb

Upvotes: 1

Related Questions