Reputation: 605
I am student and I want to implement a top-down parser in my C# language translation project. For example, if I need to construct a parser tree for the sentence "My name is Husni and I am a student", how can I do it with C#?
Upvotes: 1
Views: 4078
Reputation: 1141
You are trying to parse a natural language which is ambiguous. This means your parser would allow multiple parse trees for a sentence. That's why I don't think regular language design tools like ANTLR would help.
I'm using PEP a top-down dynamic CFG parser. It's written in Java. Porting it to c# would be easier than writing a new one from scratch.
Upvotes: 0
Reputation: 74277
You install Antlr: http://www.antlr.org/
A great compiler construction tool that generates top-down recursive descent parsers from formal grammar specifications.
And you get a copy of Terrance Parr's books:
Another choice would be Irony.Net: http://irony.codeplex.com/
Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. Irony's scanner and parser modules use the grammar encoded as c# class to control the parsing process.
Here's a sample Irony expression grammar for parsing arithmetic expressions:
using System;
using System.Collections.Generic;
using System.Text;
using Irony.Parsing;
using Irony.Ast;
namespace Irony.Samples
{
// This grammar describes programs that consist of simple expressions and assignments
// for ex:
// x = 3
// y = -x + 5
// the result of calculation is the result of last expression or assignment.
// Irony's default runtime provides expression evaluation.
// supports inc/dec operators (++,--), both prefix and postfix,
// and combined assignment operators like +=, -=, etc.
[Language("ExpressionEvaluator", "1.0", "Multi-line expression evaluator")]
public class ExpressionEvaluatorGrammar : Irony.Parsing.Grammar
{
public ExpressionEvaluatorGrammar()
{
// 1. Terminals
var number = new NumberLiteral("number");
//Let's allow big integers (with unlimited number of digits):
number.DefaultIntTypes = new TypeCode[] { TypeCode.Int32, TypeCode.Int64, NumberLiteral.TypeCodeBigInt };
var identifier = new IdentifierTerminal("identifier");
var comment = new CommentTerminal("comment", "#", "\n", "\r");
//comment must to be added to NonGrammarTerminals list; it is not used directly in grammar rules,
// so we add it to this list to let Scanner know that it is also a valid terminal.
base.NonGrammarTerminals.Add(comment);
// 2. Non-terminals
var Expr = new NonTerminal("Expr");
var Term = new NonTerminal("Term");
var BinExpr = new NonTerminal("BinExpr", typeof(BinExprNode));
var ParExpr = new NonTerminal("ParExpr");
var UnExpr = new NonTerminal("UnExpr", typeof(UnExprNode));
var UnOp = new NonTerminal("UnOp");
var BinOp = new NonTerminal("BinOp", "operator");
var PostFixExpr = new NonTerminal("PostFixExpr", typeof(UnExprNode));
var PostFixOp = new NonTerminal("PostFixOp");
var AssignmentStmt = new NonTerminal("AssignmentStmt", typeof(AssigmentNode));
var AssignmentOp = new NonTerminal("AssignmentOp", "assignment operator");
var Statement = new NonTerminal("Statement");
var ProgramLine = new NonTerminal("ProgramLine");
var Program = new NonTerminal("Program", typeof(StatementListNode));
// 3. BNF rules
Expr.Rule = Term | UnExpr | BinExpr | PostFixExpr;
Term.Rule = number | ParExpr | identifier;
ParExpr.Rule = "(" + Expr + ")";
UnExpr.Rule = UnOp + Term;
UnOp.Rule = ToTerm("+") | "-" | "++" | "--";
BinExpr.Rule = Expr + BinOp + Expr;
BinOp.Rule = ToTerm("+") | "-" | "*" | "/" | "**";
PostFixExpr.Rule = Term + PostFixOp;
PostFixOp.Rule = ToTerm("++") | "--";
AssignmentStmt.Rule = identifier + AssignmentOp + Expr;
AssignmentOp.Rule = ToTerm("=") | "+=" | "-=" | "*=" | "/=";
Statement.Rule = AssignmentStmt | Expr | Empty;
ProgramLine.Rule = Statement + NewLine;
Program.Rule = MakeStarRule(Program, ProgramLine);
this.Root = Program; // Set grammar root
// 4. Operators precedence
RegisterOperators(1, "+", "-");
RegisterOperators(2, "*", "/");
RegisterOperators(3, Associativity.Right, "**");
// 5. Punctuation and transient terms
RegisterPunctuation("(", ")");
RegisterBracePair("(", ")");
MarkTransient(Term, Expr, Statement, BinOp, UnOp, PostFixOp, AssignmentOp, ProgramLine, ParExpr);
//automatically add NewLine before EOF so that our BNF rules work correctly when there's no final line break in source
this.LanguageFlags = LanguageFlags.CreateAst | LanguageFlags.NewLineBeforeEOF | LanguageFlags.CanRunSample;
}
}
}//namespace
A third option would be to use something like NParsec, a C# port of Haskell's Parsec (monadic parser combinators — in C#, essentially using Linq to write parsers): http://www.haskell.org/haskellwiki/Parsec#Parsec_clones_in_other_languages, or other such library, like Rx Parser: http://rxx.codeplex.com/wikipage?title=Parsers
More on monadic parser combinators here:
Upvotes: 0
Reputation: 33252
After the book you can also find interesting to read about a compiler generator as ANTLR that can help you to write the compiler ( also in C# ) and browsing the AST even visually.
Upvotes: 0
Reputation: 14223
I highly recommend this book:
You can download the PDF for free. It covers parsing (both top down and bottom up) in a comprehensive way without making too many assumptions about your background.
Very good read.
As for how to do it in C#? The same way you'd do it in any other language, just using C# syntax. Learn the theory and the code comes naturally.
Upvotes: 3