Reputation: 43
I want to learn how to use regular expressions in Java and found the following tasks: Write a class to check whetever a given input string is a valid arithmetic term based on these criteria in BNF-form:
term = [Vz]summand|[Vz]summand addOp term
summand = factor | factor mulOp summand
factor = number | '('term')'
number = digit | digit number
digit = '0'|'1'|...|'9'
vz = '+'|'-'
addOp = '+'|'-'
mulOp = '*'|'/'
Using these rules, I wrote some patterns, resembling the different types:
static Pattern vz = Pattern.compile("[+-]");
static Pattern addOp = Pattern.compile("[+-]");
static Pattern multOp = Pattern.compile("[*/]");
static Pattern digit= Pattern.compile("[0-9]");
static Pattern number = Pattern.compile(digit.pattern()+"+");
static Pattern factor = Pattern.compile(number.pattern()+"|("+term.pattern()+")");
static Pattern summand = Pattern.compile(factor.pattern()+"|"+factor.pattern()+ multOp.pattern()+"\n");
static Pattern term = Pattern.compile(vz.pattern()+"?"+summand.pattern()+"|"
+vz.pattern()+"?"+summand.pattern()+addOp.pattern()+"\n");
And you already see my problem: I reference term in the definiton of factor without having it defined first. Unfortunately I can not switch it around in any way. So my question is:
Is there a possibility to reference a pattern in such a way? Or any other to reference a pattern and defining it later on?
Upvotes: 4
Views: 143
Reputation: 44269
The problem is, BNF defines a context-free grammar (which describes languages more complex than those described by regular expressions). You will have to come up with a different approach than just using the BNF rules as regex patterns straight away.
In particular, correct nesting of parentheses is not regular. Some regex engines support (non-regular) features that allow for matching of these, but the regexes often become very long and unmaintainable. And I'm not sure right now if Java has any of these features (PCRE and .NET do, for instance).
If you want to solve the task at hand, you will have to write a parse manually. If you want to learn regular expressions, you will have to either do it in another language or look for a different task. However, here is a great source to improve your regex skills.
For the fun of it (and to show you why regular expressions are not the right tool for this, even if the engine supports the necessary features), here is the regular expression that corresponds to the above BNF (except for the Vz
rule, for some odd reason I could not get it to work):
^(((\d+|[(](?1)[)])|(?3)[*\/](?2))|(?2)[+-](?1))$
The (?n)
recursively try to match the nth
subpattern (which are counted by opening parentheses from left to right).
It does not work in PHP, but I believe their PCRE implementation has some backtracking issues when using recursion. An online PCRE tester seemed to treat some example input correctly. Here it is in free-spacing mode (x
) with some annotations:
^
( # term (?1)
( # summand (?2)
( # factor (?3)
\d+ # number
|
[(](?1)[)] # (term)
) # end of factor
|
(?3)[*/](?2) # factor mulOp summand
) # end of summand
|
(?2)[+-](?1) # summand addOp term
) # end of term
$
Upvotes: 4