Husni Jabir
Husni Jabir

Reputation: 605

Implementing a Top Down Parser in C#

I am student and I want to implement a top-down parser in my C# language translation project. For example, if I need to construct a parser tree for the sentence "My name is Husni and I am a student", how can I do it with C#?

Upvotes: 1

Views: 4078

Answers (4)

hrzafer
hrzafer

Reputation: 1141

You are trying to parse a natural language which is ambiguous. This means your parser would allow multiple parse trees for a sentence. That's why I don't think regular language design tools like ANTLR would help.

I'm using PEP a top-down dynamic CFG parser. It's written in Java. Porting it to c# would be easier than writing a new one from scratch.

Upvotes: 0

Nicholas Carey
Nicholas Carey

Reputation: 74277

You install Antlr: http://www.antlr.org/

A great compiler construction tool that generates top-down recursive descent parsers from formal grammar specifications.

And you get a copy of Terrance Parr's books:

Another choice would be Irony.Net: http://irony.codeplex.com/

Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. Irony's scanner and parser modules use the grammar encoded as c# class to control the parsing process.

Here's a sample Irony expression grammar for parsing arithmetic expressions:

using System;
using System.Collections.Generic;
using System.Text;
using Irony.Parsing;
using Irony.Ast;

namespace Irony.Samples
{
  // This grammar describes programs that consist of simple expressions and assignments
  // for ex:
  // x = 3
  // y = -x + 5
  //  the result of calculation is the result of last expression or assignment.
  //  Irony's default  runtime provides expression evaluation. 
  //  supports inc/dec operators (++,--), both prefix and postfix,
  //  and combined assignment operators like +=, -=, etc.

  [Language("ExpressionEvaluator", "1.0", "Multi-line expression evaluator")]
  public class ExpressionEvaluatorGrammar : Irony.Parsing.Grammar
  {

    public ExpressionEvaluatorGrammar()
    {

      // 1. Terminals
      var number = new NumberLiteral("number");

      //Let's allow big integers (with unlimited number of digits):
      number.DefaultIntTypes = new TypeCode[] { TypeCode.Int32, TypeCode.Int64, NumberLiteral.TypeCodeBigInt };
      var identifier         = new IdentifierTerminal("identifier");
      var comment            = new CommentTerminal("comment", "#", "\n", "\r"); 

      //comment must to be added to NonGrammarTerminals list; it is not used directly in grammar rules,
      // so we add it to this list to let Scanner know that it is also a valid terminal. 
      base.NonGrammarTerminals.Add(comment);

      // 2. Non-terminals
      var Expr           = new NonTerminal("Expr");
      var Term           = new NonTerminal("Term");
      var BinExpr        = new NonTerminal("BinExpr", typeof(BinExprNode));
      var ParExpr        = new NonTerminal("ParExpr");
      var UnExpr         = new NonTerminal("UnExpr", typeof(UnExprNode));
      var UnOp           = new NonTerminal("UnOp");
      var BinOp          = new NonTerminal("BinOp", "operator");
      var PostFixExpr    = new NonTerminal("PostFixExpr", typeof(UnExprNode));
      var PostFixOp      = new NonTerminal("PostFixOp");
      var AssignmentStmt = new NonTerminal("AssignmentStmt", typeof(AssigmentNode));
      var AssignmentOp   = new NonTerminal("AssignmentOp", "assignment operator");
      var Statement      = new NonTerminal("Statement");
      var ProgramLine    = new NonTerminal("ProgramLine");
      var Program        = new NonTerminal("Program", typeof(StatementListNode));

      // 3. BNF rules
      Expr.Rule           = Term | UnExpr | BinExpr | PostFixExpr;
      Term.Rule           = number | ParExpr | identifier;
      ParExpr.Rule        = "(" + Expr + ")";
      UnExpr.Rule         = UnOp + Term;
      UnOp.Rule           = ToTerm("+") | "-" | "++" | "--";
      BinExpr.Rule        = Expr + BinOp + Expr;
      BinOp.Rule          = ToTerm("+") | "-" | "*" | "/" | "**";
      PostFixExpr.Rule    = Term + PostFixOp;
      PostFixOp.Rule      = ToTerm("++") | "--";
      AssignmentStmt.Rule = identifier + AssignmentOp + Expr;
      AssignmentOp.Rule   = ToTerm("=") | "+=" | "-=" | "*=" | "/=";
      Statement.Rule      = AssignmentStmt | Expr | Empty;
      ProgramLine.Rule    = Statement + NewLine;
      Program.Rule        = MakeStarRule(Program, ProgramLine);
      this.Root           = Program;       // Set grammar root

      // 4. Operators precedence
      RegisterOperators(1, "+", "-");
      RegisterOperators(2, "*", "/");
      RegisterOperators(3, Associativity.Right, "**");

      // 5. Punctuation and transient terms
      RegisterPunctuation("(", ")");
      RegisterBracePair("(", ")"); 
      MarkTransient(Term, Expr, Statement, BinOp, UnOp, PostFixOp, AssignmentOp, ProgramLine, ParExpr);

      //automatically add NewLine before EOF so that our BNF rules work correctly when there's no final line break in source
      this.LanguageFlags = LanguageFlags.CreateAst | LanguageFlags.NewLineBeforeEOF | LanguageFlags.CanRunSample; 

    }

  }

}//namespace

A third option would be to use something like NParsec, a C# port of Haskell's Parsec (monadic parser combinators — in C#, essentially using Linq to write parsers): http://www.haskell.org/haskellwiki/Parsec#Parsec_clones_in_other_languages, or other such library, like Rx Parser: http://rxx.codeplex.com/wikipage?title=Parsers

More on monadic parser combinators here:

Upvotes: 0

Felice Pollano
Felice Pollano

Reputation: 33252

After the book you can also find interesting to read about a compiler generator as ANTLR that can help you to write the compiler ( also in C# ) and browsing the AST even visually.

Upvotes: 0

riwalk
riwalk

Reputation: 14223

I highly recommend this book:

Basics of Compiler Design

You can download the PDF for free. It covers parsing (both top down and bottom up) in a comprehensive way without making too many assumptions about your background.

Very good read.

As for how to do it in C#? The same way you'd do it in any other language, just using C# syntax. Learn the theory and the code comes naturally.

Upvotes: 3

Related Questions