Reputation: 132778

Can I insert named captures in the Match tree without actually matching anything?

I was curious if I could insert things into the Match tree without actually anything. There's no associated problem I'm trying to solve.

In this example, I have a token market that checks that its match is a key in the hash. I was trying to then insert the value of that hash into the match tree somehow. I figured I could have a token that always matches, long_market_string, and then look into the tree somehow to see what market had matched.

grammar OrderNumber::Grammar {
    token TOP    {
        <channel> <product> <market> <long_market_string> '/' <revision>
        }

    token channel    { <[ M F P ]> }
    token product    { <[ 0..9 A..Z ]> ** 4 }

    token market     {
        (<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
        }

    # this should figure out what market matched
    # I don't particularly care how this happens as long as
    # I can insert this into the match tree
    token long_market_string { <?> }

    token revision   { <[ A..C ]> }
    }

Is there some way to mess with the Match tree as it is being created?

I could do something that inverts things:

grammar AppleOrderNumber::Grammar {
    token TOP    {
        <channel> <product> <long_market_string> '/' <revision>
        }

    token channel    { <[ M F P ]> }
    token product    { <[ 0..9 A..Z ]> ** 4 }

    token market     {
        (<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
        }
    token long_market_string { <market> }
    token revision   { <[ A..C ]> }
    }

But, that handles that case. I'm more interested in inserting an arbitrary number of things.

Upvotes: 3

Answers (2)

arnsholt

Reputation: 871

It sounds like you want to subvert the match tree into doing something the match tree isn't really supposed to do. The match tree tracks what substrings were matched where in the input string, not arbitrary data generated by the parser. If you want to track arbitrary data, what's wrong with the AST tree?

Sure, in one sense the AST tree has to mirror the parse tree, since it's constructed in a bottom-up fashion as the match methods complete successfully. But the AST itself, in the sense of "the object attached to any given node" is not so restricted. Consider for example:

grammar G {
    token TOP { <foo> <bar> {make "TOP is: " ~ $<foo> ~ $<bar>} }
    token foo { foo {make "foo"} }
    token bar { bar {make "bar"} }
}
G.parse("foobar");

Here $/.made will simply be the string "TOP is: foobar" while the match tree has child nodes with the components that were used to construct the top-level AST. If then return to your example, we can make it:

grammar G {
    my %Market_Shortcode = :AA('Double A');
    token TOP    {
        <channel> <product> <market>
        {} # Force the computation of the $/ object. Note that this will also terminate LTM here.
        <long_market_string(~$<market>)> '/' <revision>
        }

    token channel    { <[ M F P ]> }
    token product    { <[ 0..9 A..Z ]> ** 4 }

    token market     {
        (<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
        }

    token long_market_string($shortcode) { <?> { say 'c='~$shortcode; make %Market_Shortcode{$shortcode} } }

    token revision   { <[ A..C ]> }
    }

G.parse('M0000AA/A');

$<long_market_string>.ast will now be 'Double A'. Of course, I'd probably dispense with token long_market_name and just make the AST of token market whatever is in %Market_Shortcode (or a Market object with both short and long name, if you want to track both at once).

A less trivial example of this kind of thing would be something like a grammar of Python. Since Python's block level structure is line-based, your grammar (and thus match tree) need to reflect this in some way. But you can also chain several simple statements together on a single line by separating them with semi-colons. Now, you'll probably want the AST of a block to be a list of statements, while the AST of a single line may itself be a list of several statements. Thus you'd construct the AST of the block by (for example) flatmaping together the list of the lines (or something along those lines, depending on how you represent block statements like if and while).

Now, if you really, really, really want to do nasty things to the match tree I'm pretty sure it can be done, of course. You'll have to implement the parsing code yourself with method long_market_name, the API for which is undocumented and internal, and will likely involve at least some dropping down into nqp::ops. The stuff pointed to here will probably be useful. Other relevant files are src/core/{Match,Cursor}.pm in the Rakudo repo. Note also that the stringification of Matches is computed by extracting the matched substring from the input string, so if you want it to stringify usefully, you'll have to subclass Match.

Upvotes: 0

Brad Gilbert

Reputation: 34120

Tokens are a type of method, so if you wrote a method that did all of the setup work that a token does for you, you could do almost anything.

This is not specced, and is currently not easy.
( I only have a vague idea of where to start looking in the source code to figure it out )

What you can do easily is add to the .made/.ast of the result
( .made and .ast are synonyms )

$/ = grammar {
  token TOP {
    .*
    {
      make 'World'
    }
  }
}.parse('Hello');

say "$/ $/.made()";  # Hello World

It doesn't even have to be inside of a grammar

'asdf' ~~ /{make 42}/;
say $/;     # ｢｣
say $/.made # 42

Most of the time you would use an Actions class for this type of thing

grammar example-grammar {
  token TOP {
    [ <number> | <word> ]+ % \s*
  }
  token word {
    <.alpha>+
  }
  token number {
    \d+
    { make +$/ }
  }
}

class example-actions {
  method TOP    ($/) { make $/.pairs.map:{ .key => .value».made} }
  method number ($/) { #`( already done in grammar, so this could be removed ) }
  method word   ($/) { make ~$/ }
}

.say for example-grammar.parse(
  'Hello 123 World',
  :actions(example-actions)
).made».perl

# :number([123])
# :word(["Hello", "World"])

Upvotes: 1

Can I insert named captures in the Match tree without actually matching anything?

Answers (2)

Related Questions