Reputation: 8977
As part of an upcoming project, I'd like to set it up so that a certain domain object can apply to tags or combinations of tags.
I'd like to be able to have users enter these combinations in a human-readable way, similar to:
Does a tool-set exist to do this kind of logic parsing from one string of entered text? I could define the tags behind the scenes with a certain distinction ({}, [], etc.) so that they can be parsed out more easily as well.
Just wondering what the best way is to parse that human-readable text into those distinct sets of combinations without the user having to enter each specific combination.
Thanks!
Upvotes: 6
Views: 293
Reputation: 66614
Typically this involves two steps: lexing (short for lexical analysis) and parsing.
In the first step, the input string is transformed into a sequence of lexical items, called tokens. For this purpose, you might declare an enum type for the different types of token, e.g.:
public enum TokenType
{
OpenParenthesis,
CloseParenthesis,
And,
Or,
Tag
}
and a class for tokens:
sealed class Token
{
public TokenType Type { get; private set; }
public string Item { get; private set; }
public Token(TokenType type, string item) { Type = type; Item = item; }
}
Now you write an algorithm that turns the input string, e.g. tag-a and (tag-b or tag-c)
, into a sequence of Token
instances. You can use regular expressions to recognise the various items, for example @"\s*\(\s*"
would be the regular expression to recognise open-parentheses. The finished sequence would then look something like this:
new Token(TokenType.Tag, "tag-a")
new Token(TokenType.And, null)
new Token(TokenType.OpenParenthesis, null)
new Token(TokenType.Tag, "tag-b")
new Token(TokenType.Or, null)
new Token(TokenType.Tag, "tag-c")
new Token(TokenType.CloseParenthesis, null)
Once you have this sequence, you need to run a parser on it. There are many strategies for parsing expressions like these; for the beginning, I recommend to you the recursive descent parser.
You will, of course, need a few classes to contain the parse tree:
abstract class Node { }
enum BooleanOperator { And, Or }
sealed class BooleanNode : Node
{
public BooleanOperator Operator { get; private set; }
public Node Left { get; private set; }
public Node Right { get; private set; }
public BooleanNode(BooleanOperator op, Node left, Node right)
{
Operator = op;
Left = left;
Right = right;
}
}
sealed class TagNode : Node
{
public string Tag { get; private set; }
public TagNode(string tag) { Tag = tag; }
}
And then a recursive descent parser might look something like this:
public static Node ParseExpression(Token[] tok)
{
int i = 0;
return parseExpressionBoolOr(tok, ref i);
}
private static Node parseExpressionBoolOr(Token[] tok, ref int i)
{
var left = parseExpressionBoolAnd(tok, ref i);
while (tok[i].Type == TokenType.Or)
{
i++;
var right = parseExpressionBoolAnd(tok, ref i);
left = new BooleanNode(BooleanOperator.Or, left, right);
}
return left;
}
private static Node parseExpressionBoolAnd(Token[] tok, ref int i)
{
var left = parseExpressionPrimary(tok, ref i);
while (tok[i].Type == TokenType.And)
{
i++;
var right = parseExpressionPrimary(tok, ref i);
left = new BooleanNode(BooleanOperator.And, left, right);
}
return left;
}
private static Node parseExpressionPrimary(Token[] tok, ref int i)
{
if (tok[i].Type == TokenType.OpenParenthesis)
{
i++;
var node = parseExpressionBoolOr(tok, ref i);
if (tok[i].Type != TokenType.CloseParenthesis)
throw new InvalidOperationException(); // or customised parse exception
return node;
}
else if (tok[i].Type == TokenType.Tag)
{
var node = new TagNode(tok[i].Item);
i++;
return node;
}
else
throw new InvalidOperationException(); // or customised parse exception
}
Note that this is a greatly simplified example. However, it is maximally flexible: you can extend this algorithm to parse absolutely any language you want.
Upvotes: 7