Reputation: 2161
If I wanted for example to define the Lisp programming language, where a name can include even non-alphanumeric characters, should I list all the usable characters with a notation like:
validchar ::= "a" | "b" | "c" ... "-" | "*" | "$" ... ;
name = validchar, (validchar | digit)+;
Or am I allowed to use regexs, like:
validchar ::= "[^(^)^\s^\d]";
name ::= validchar, (validchar | digit)*;
Or even:
name ::= "[^(^)^\s^\d]", "[^(^)^\s]"*;
This would shorten it a lot, and it would include even characters like ₩, ¥, € and so on, which I can't list but are actually usable.
Upvotes: 2
Views: 97
Reputation: 95362
Whether this is allowed depends on the tool you are using that implements the (E)BNF notation.
Some tools are rather strict and stick to the original definition of (E)BNF, allowing at best Kleene * or + on language tokens. An additional point is that there is no requirement for classic (E)BNF to operate on characters as terminals.
Clearly it is convenient to be able to define some language tokens directly in terms of characters, and one can imagine (as you have) an EBNF in which one can write not only characters as terminals, but also regexes over characters.
Whether the tool you propose to use allows that... depends entirely on the tool. Many tools that process (E)BNF such as YACC are actually designed to work in conjunction with another tool, a "lexer generator" (for YACC, this is called FLEX) that defines character sequences for tokens. With such tool pairs, the (E)BNF tool typically does not allow any mention of characters or regexes over them, but the lexer generator tool explicitly does allow character and regex specifications for tokens.
There are hundreds of (E)BNF and lexer generator tools, each with somewhat (egregiously different) rules. Check the tool documentation.
Or write it the way you want to write it, and build your own (101st) tool.
Upvotes: 3