Discipulus Vitae
Discipulus Vitae

Reputation: 51

Parse string with nested brackets

String to parse (without spaces):

 "instrumentalist  (  bass  (upright  , fretless , 5-string ) ,  guitar  ( electric , acoustic ) ,  trumpet  ),  teacher  ,  songwriter,    producer"

I need to get this structure in Ruby

["instrumentalist",[["bass",["upright","fretless","5-string"]],["guitar",["electric","acoustic"]],["trumpet"]],["teacher"],["songwriter"],["producer"]]

Because of nested (,) and , String#partition couldn't help me. I don't really know is there a fancy RegEx that could extract such type of strings. Or do I have to go with a lexer?

Upvotes: 2

Views: 1918

Answers (1)

Frederick Cheung
Frederick Cheung

Reputation: 84182

A regex on its own isn't the right sort of thing for this type of problem, even though the basic process is simple: walk through your string looking for commas or brackets. When you find a comma add the previous read characters to the current nesting. When you find an open bracket then your nesting level goes up by 1, when you find a close bracket decrease it.

StringScanner is designed for this sort of stuff as it allows us to walk through the string while maintaining, some state, in this case a stack that mirrors your opening and closing brackets. Something like this does the job for me

require 'strscan'

def parse input
  scanner = StringScanner.new input
  stack = [[]]
  while string = scanner.scan(/[^(),]+/)
    case scanner.scan /[(),]+/
    when '('
      new_nesting = [string, []]
      stack.last << new_nesting
      stack << new_nesting[1]
    when ')'
      scanner.scan(/,/)
      stack.last << string
      stack.pop
    else
      stack.last << string
    end
  end
  stack.last
end

Upvotes: 9

Related Questions