Reputation: 51
String to parse (without spaces):
"instrumentalist ( bass (upright , fretless , 5-string ) , guitar ( electric , acoustic ) , trumpet ), teacher , songwriter, producer"
I need to get this structure in Ruby
["instrumentalist",[["bass",["upright","fretless","5-string"]],["guitar",["electric","acoustic"]],["trumpet"]],["teacher"],["songwriter"],["producer"]]
Because of nested (
,)
and ,
String#partition
couldn't help me. I don't really know is there a fancy RegEx that could extract such type of strings. Or do I have to go with a lexer?
Upvotes: 2
Views: 1918
Reputation: 84182
A regex on its own isn't the right sort of thing for this type of problem, even though the basic process is simple: walk through your string looking for commas or brackets. When you find a comma add the previous read characters to the current nesting. When you find an open bracket then your nesting level goes up by 1, when you find a close bracket decrease it.
StringScanner is designed for this sort of stuff as it allows us to walk through the string while maintaining, some state, in this case a stack that mirrors your opening and closing brackets. Something like this does the job for me
require 'strscan'
def parse input
scanner = StringScanner.new input
stack = [[]]
while string = scanner.scan(/[^(),]+/)
case scanner.scan /[(),]+/
when '('
new_nesting = [string, []]
stack.last << new_nesting
stack << new_nesting[1]
when ')'
scanner.scan(/,/)
stack.last << string
stack.pop
else
stack.last << string
end
end
stack.last
end
Upvotes: 9