josh
josh

Reputation: 1554

Parsing, matching and keywords

I am working with the Java15 grammar and have a couple questions about how Rascal's parser works and why some things aren't working. Given a concrete syntax:

module tests::Concrete

start syntax CompilationUnit =
    compilationUnit: TypeDec* LAYOUTLIST
    ;

syntax TypeDec =
    ClassDec
    ;

syntax ClassDec =
    \class: ClassDecHead ClassBody
    ;

syntax ClassDecHead =
    "class" Id
    ;

syntax ClassBody =
    "{" ClassBodyDec* "}"
    ;

syntax ClassBodyDec =
    ClassMemberDec
    ;

syntax ClassMemberDec =
    MethodDec
    ;

syntax MethodDec =
    \method: MethodDecHead
    ;

syntax MethodDecHead =
    ResultType Id
    ;

syntax ResultType =
    \void: "void"
    ;

syntax Id =
    \id: [A-Z_a-z] !<< ID \ IDKeywords !>> [0-9A-Z_a-z]
    ;

keyword Keyword =
    "void"
    ;

keyword IDKeywords =
    "null"
    | Keyword
    ;

lexical LAYOUT =
    [\t-\n \a0C-\a0D \ ]
    ;

lexical ID =
    [A-Z_a-z] [0-9A-Z_a-z]*
    ;

layout LAYOUTLIST  =
    LAYOUT* !>> [\t-\n \a0C-\a0D \ ] !>> (  [/]  [*]  ) !>> (  [/]  [/]  ) !>> "/*" !>> "//"
    ;

an AST definition:

module tests::Abstract

data Declaration =
    \compilationUnit(list[Declaration] body)
   | \package(ID name)
   | \import(ID name)
   | \class(ID name, list[Declaration] body)
   | \method(Type ret, ID name)
             ;

data Type =
    \void()
    ;

data ID =
    \id(str id)
    ;

and a driver to load files:

module tests::Load

import Prelude;
import tests::Concrete;
import tests::Abstract;

public Declaration load(loc l) = implode(#Declaration, parse(#CompilationUnit, l));

I'm finding some oddities in what is actually working and what isn't. If I take the program:

class A {

}

This parses as expected into: compilationUnit([ class(id("A"),[]) ]) But parsing and constructing AST nodes for methods inside of the class is proving to be a bit hairy. Given the program:

class A {
    void f
}

this produces a "Cannot find a constructor for Declaration" error. If I modify the syntax to be:

syntax MethodDecHead =
    ResultType
    ;

The AST to be:

| \method(Type ret)

I'm able to get the tree I would expect: compilationUnit([class(id("A"),[method(void())])])

I'm having a lot of confusion about what's going on here, how keywords are handled and what's causing this behaviour.

In addition to this if I don't add the LAYOUTLIST to the end of the start syntax production I get a ParseError anytime I try to read from a file.

Upvotes: 2

Views: 247

Answers (2)

Tijs van der Storm
Tijs van der Storm

Reputation: 111

The production rule of ClassDec is not compatible with the AST node class. Changing it to:

syntax ClassDec = \class: "class" Id "{" ClassBodyDec* "}" ; Makes it more regular and isomorphic with the AST node class(ID name, list[Declaration])

However: the names should always correspond, so I'd suggest changing ID to Id in the grammar. Further, your AST node expects Declarations, but in the grammar you have ClassBodyDecs.

The general rules for implode are:

  • Non-terminal corresponds to ADT type
  • Production label corresponds to ADT constructor
  • Keywords, operators, layout etc. is skipped.
  • Unlabeled lexical productions map to primitives (str, int, real).
  • Labeled lexicals can map to constructors if you want: lexical Id = id: [a-z]+, can map to data Id = id(str x);
  • If you don't label context-free productions, implode "looks over them": so if I had syntax A = B; syntax B = cons: "bla", then I can use the ADT: data A = cons().

(These rules are documented in Parsetree.rsc, https://github.com/cwi-swat/rascal/blob/master/src/org/rascalmpl/library/ParseTree.rsc)

Upvotes: 2

Jurgen Vinju
Jurgen Vinju

Reputation: 6696

I'm not the expert on implode so I leave that for now, but the LAYOUTLIST thing is due to the way parse is called.

Every start non-terminal defined by start Something = produces two types, namely: * the non-terminal itself Something and * a wrapper non-terminal named start[Something].

The wrapper is automatically/implicitly defined by this:

syntax start[Something] = LAYOUTLIST before Something top LAYOUTLIST after;

So, if you want to have whitespace and comments before and after your program you call parse like so:

parse(#start[Something], yourLocation)

And if you are not interested in keeping the comments or whitespace for later, then you could project out the top tree like so:

Something mySomething = parse(#start[Something], myLocation).top;

Upvotes: 1

Related Questions