in Parslet, how to reconstruct substrings from parse subtrees?

Question

I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.' The argument values are code, which has its own set of parse rules.

Here's a version of my parser, simplified to only allow basic arithmetic as code:

require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
  rule :integer do match('[0-9]').repeat(1).as :integer end
  rule :space do match('[\s\n]').repeat(1) end
  rule :parens do str('(') >> code >> str(')') end
  rule :operand do integer | parens end
  rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
  rule :code do addition | operand end
  rule :name do match('[a-z]').repeat 1 end
  rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
  rule :arguments do argument >> (str(',') >> space >> argument).repeat end
  rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
  rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
  rule :segments do (interpolation | text).repeat end
  root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false

Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):

[
    {
        :text => "This sentence "@0
    },
    {
        :arguments => [
            {
                 :name => "x"@16,
                :value => {
                    :integer => "2"@19
                }
            },
            {
                 :name => "y"@22,
                :value => {
                    :addition => {
                         :left => {
                            :addition => {
                                 :left => {
                                    :integer => "2"@26
                                },
                                :right => {
                                    :integer => "5"@30
                                }
                            }
                        },
                        :right => {
                            :integer => "3"@35
                        }
                    }
                }
            }
        ]
    },
    {
        :text => " has stuff in it."@37
    }
]

However, I want to store the argument values as strings, so this would be the ideal result:

[
    {
        :text => "This sentence "@0
    },
    {
        :arguments => [
            {
                 :name => "x"@16,
                :value => "2"
            },
            {
                 :name => "y"@22,
                :value => "(2 + 5) + 3"
            }
        ]
    },
    {
        :text => " has stuff in it."@37
    }
]

How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).

Is it possible to leverage or hack Parslet to return the substring?

rcrogers · Accepted Answer

Here's the hack I ended up with. There are better ways to accomplish this, but they'd require more extensive changes. Parser#parse now returns a Result. Result#tree gives the normal parse result, and Result#strings is a hash that maps subtree structures to source strings.

module Parslet

  class Parser
    class Result < Struct.new(:tree, :strings); end
    def parse(source, *args)
      source = Source.new(source) unless source.is_a? Source
      value = super source, *args 
      Result.new value, source.value_strings
    end
  end

  class Source
    prepend Module.new{
      attr_reader :value_strings
      def initialize(*args)
        super *args
        @value_strings = {}
      end
    }
  end

  class Atoms::Base
    prepend Module.new{
      def apply(source, *args)
        old_pos = source.bytepos
        super.tap do |success, value|
          next unless success
          string = source.instance_variable_get(:@str).string.slice(old_pos ... source.bytepos)
          source.value_strings[flatten(value)] = string
        end
      end    
    }
  end

end

in Parslet, how to reconstruct substrings from parse subtrees?

Answers (2)

Related Questions