Reputation: 2331
I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
The argument values are code, which has its own set of parse rules.
Here's a version of my parser, simplified to only allow basic arithmetic as code:
require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
rule :integer do match('[0-9]').repeat(1).as :integer end
rule :space do match('[\s\\n]').repeat(1) end
rule :parens do str('(') >> code >> str(')') end
rule :operand do integer | parens end
rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
rule :code do addition | operand end
rule :name do match('[a-z]').repeat 1 end
rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
rule :arguments do argument >> (str(',') >> space >> argument).repeat end
rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
rule :segments do (interpolation | text).repeat end
root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false
Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => {
:integer => "2"@19
}
},
{
:name => "y"@22,
:value => {
:addition => {
:left => {
:addition => {
:left => {
:integer => "2"@26
},
:right => {
:integer => "5"@30
}
}
},
:right => {
:integer => "3"@35
}
}
}
}
]
},
{
:text => " has stuff in it."@37
}
]
However, I want to store the argument values as strings, so this would be the ideal result:
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => "2"
},
{
:name => "y"@22,
:value => "(2 + 5) + 3"
}
]
},
{
:text => " has stuff in it."@37
}
]
How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).
Is it possible to leverage or hack Parslet to return the substring?
Upvotes: 1
Views: 237
Reputation: 2331
Here's the hack I ended up with. There are better ways to accomplish this, but they'd require more extensive changes. Parser#parse
now returns a Result
. Result#tree
gives the normal parse result, and Result#strings
is a hash that maps subtree structures to source strings.
module Parslet
class Parser
class Result < Struct.new(:tree, :strings); end
def parse(source, *args)
source = Source.new(source) unless source.is_a? Source
value = super source, *args
Result.new value, source.value_strings
end
end
class Source
prepend Module.new{
attr_reader :value_strings
def initialize(*args)
super *args
@value_strings = {}
end
}
end
class Atoms::Base
prepend Module.new{
def apply(source, *args)
old_pos = source.bytepos
super.tap do |success, value|
next unless success
string = source.instance_variable_get(:@str).string.slice(old_pos ... source.bytepos)
source.value_strings[flatten(value)] = string
end
end
}
end
end
Upvotes: 1
Reputation: 21548
The tree produced is based on the use of as
in your parser.
You can try removing them from anything in an expression so you get a single string match for the expression. This seems to be what you are after.
If you want the parsed tree for these expressions too, then you need to either:
Neither of these is ideal, but if speed is not vital, I would go the re-parse option. ie. remove the as
atoms, and then later reparse the expressions to trees as needed.
As you rightly want to reuse the same rules, but this time you need as
captures throughout the rules, then you could implement this by deriving a parser from your existing parser and implementing rules with the same names in terms of rule :x { super.x.as(:x)}
OR
You could have a general rule for expression that matches the whole expression without knowing what is in it.
eg. "#{" >> (("}".absent >> any) | "\\}").repeat(0) >> "}"
Then later you can parse each expression into a tree as needed. that way you are not repeating your rules. It assumes you can tell when your expression is complete without parsing the whole expression subtree.
Failing that, it leaves us with hacking parslet.
I don't have a solution here, just some hints.
Parslet has a module called "CanFlatten" that implements flatten
and is used by as
to convert the captured tree back to a single string. You are going to want to do something like this.
Alternatively you need to change the succ
method in Atom::Base
to return "[success/fail, result, consumed_upto_position]" so each match knows where it consumed up to. Then you can read from the source between the start position and end position to get the raw text back. The current position
of the source at the point the parser matches should be the value you want.
Good Luck.
Note: My example expression parser doesn't handle escaping of the escape character.. (left as an exercise for the reader)
Upvotes: 1