anru
anru

Reputation: 1414

BNF grammar to recoganize of legal combination of C keywords

let say I have a list of C keywords, int char long float double signed unsigned short const volatile

the grammar should accept those combinations:

volatile unsigned long int
long unsigned volatile int

etc.

but reject something like this:

unsigned singed short long 

Upvotes: 1

Views: 75

Answers (2)

Luis Colorado
Luis Colorado

Reputation: 12668

The problem is that the BNF grammar rules for a set of n words that can appear only once, but in any order has, in general 2^n rules. It is simpler to accept them in any order and reject the non possible combinations (e.g. long can appear twice but no more times)

The demonstration is simple, a finite automaton can be built with 2^n states, that, for each state reflects a bit pattern with 1s on the symbols that have already been parsed, and 0s for the symbols that are still to be expected. That automaton generates an n dimensional hyper cube for every transition of 0 to 1 for each bit position. Converting this automaton to a set of BNF rules is trivial, but the set (of states, and of rules) is optimal (it's easy to demonstrate that missing only one such states, will make a set of n transitions ---one for each arc entering or exiting that node--- to be impossible)

This means that, if you have e.g. 10 symbols (as you show in the question) you can end with a grammar ruleset of the order of 1024 rules. As the order in which the words appear doesn't alter the program semantics, is better to allow them in whatever order they come, and count each, for possible excess repetitions. The result will be far more compact, and more scalable, in case you need to add more keywords.

My suggestion is to allow:

list_of_keywords: list_of_keywords 'keyword1'
                | list_of_keywords 'keyword2'
                | list_of_keywords 'keyword3'
                ...
                | list_of_keywords 'keywordn'
                | ;

and process the list once input.

Upvotes: 1

Eric Postpischil
Eric Postpischil

Reputation: 222923

A grammar for legal combinations of C type specifiers in list of declaration specifiers is:

OODS AtomicTypeSpecifier OODS
OODS EnumSpecifier OODS
OODS StructOrUnionSpecifier OODS
OODS TypeDefName OODS
OODS _Bool OODS
OODS _Complex OODS double OODS
OODS _Complex OODS double OODS long OODS
OODS _Complex OODS float OODS
OODS _Complex OODS long OODS double OODS
OODS char OODS
OODS char OODS signed OODS
OODS char OODS unsigned OODS
OODS double OODS
OODS double OODS _Complex OODS
OODS double OODS _Complex OODS long OODS
OODS double OODS long OODS
OODS double OODS long OODS _Complex OODS
OODS float OODS
OODS float OODS _Complex OODS
OODS int OODS
OODS int OODS long OODS
OODS int OODS long OODS long OODS
OODS int OODS long OODS long OODS signed OODS
OODS int OODS long OODS long OODS unsigned OODS
OODS int OODS long OODS signed OODS
OODS int OODS long OODS signed OODS long OODS
OODS int OODS long OODS unsigned OODS
OODS int OODS long OODS unsigned OODS long OODS
OODS int OODS short OODS
OODS int OODS short OODS signed OODS
OODS int OODS short OODS unsigned OODS
OODS int OODS signed OODS
OODS int OODS signed OODS long OODS
OODS int OODS signed OODS long OODS long OODS
OODS int OODS signed OODS short OODS
OODS int OODS unsigned OODS
OODS int OODS unsigned OODS long OODS
OODS int OODS unsigned OODS long OODS long OODS
OODS int OODS unsigned OODS short OODS
OODS long OODS
OODS long OODS _Complex OODS double OODS
OODS long OODS double OODS
OODS long OODS double OODS _Complex OODS
OODS long OODS int OODS
OODS long OODS int OODS long OODS
OODS long OODS int OODS long OODS signed OODS
OODS long OODS int OODS long OODS unsigned OODS
OODS long OODS int OODS signed OODS
OODS long OODS int OODS signed OODS long OODS
OODS long OODS int OODS unsigned OODS
OODS long OODS int OODS unsigned OODS long OODS
OODS long OODS long OODS
OODS long OODS long OODS int OODS
OODS long OODS long OODS int OODS signed OODS
OODS long OODS long OODS int OODS unsigned OODS
OODS long OODS long OODS signed OODS
OODS long OODS long OODS signed OODS int OODS
OODS long OODS long OODS unsigned OODS
OODS long OODS long OODS unsigned OODS int OODS
OODS long OODS signed OODS
OODS long OODS signed OODS int OODS
OODS long OODS signed OODS int OODS long OODS
OODS long OODS signed OODS long OODS
OODS long OODS signed OODS long OODS int OODS
OODS long OODS unsigned OODS
OODS long OODS unsigned OODS int OODS
OODS long OODS unsigned OODS int OODS long OODS
OODS long OODS unsigned OODS long OODS
OODS long OODS unsigned OODS long OODS int OODS
OODS short OODS
OODS short OODS int OODS
OODS short OODS int OODS signed OODS
OODS short OODS int OODS unsigned OODS
OODS short OODS signed OODS
OODS short OODS signed OODS int OODS
OODS short OODS unsigned OODS
OODS short OODS unsigned OODS int OODS
OODS signed OODS
OODS signed OODS char OODS
OODS signed OODS int OODS
OODS signed OODS int OODS long OODS
OODS signed OODS int OODS long OODS long OODS
OODS signed OODS int OODS short OODS
OODS signed OODS long OODS
OODS signed OODS long OODS int OODS
OODS signed OODS long OODS int OODS long OODS
OODS signed OODS long OODS long OODS
OODS signed OODS long OODS long OODS int OODS
OODS signed OODS short OODS
OODS signed OODS short OODS int OODS
OODS unsigned OODS
OODS unsigned OODS char OODS
OODS unsigned OODS int OODS
OODS unsigned OODS int OODS long OODS
OODS unsigned OODS int OODS long OODS long OODS
OODS unsigned OODS int OODS short OODS
OODS unsigned OODS long OODS
OODS unsigned OODS long OODS int OODS
OODS unsigned OODS long OODS int OODS long OODS
OODS unsigned OODS long OODS long OODS
OODS unsigned OODS long OODS long OODS int OODS
OODS unsigned OODS short OODS
OODS unsigned OODS short OODS int OODS
OODS void OODS

Notes:

  • The lines are ORed (connected by an | operator).
  • OODS is optional other declaration specifiers (zero or more storage class specifiers, qualifiers, and so on).
  • AtomicTypeSpecifier, EnumSpecifier, StructOrUnionSpecifier, and TypeDefName are the obvious tokens from the ordinary C grammar.

Upvotes: 1

Related Questions