Reputation: 173
I'm trying to understand CFG by using an example with some obstacles in it.
For example, I want to match the declaration of a double variable:
double d; In this case, "d" could be any other valid identifier.
There are some cases that should not be matched, e.g. "double double;", but I don't understand how to avoid a match of the second "double"
My approach:
G = (Σ, V, S, P)
Σ = {a-z}
V = {S,T,U,W}
P = { S -> doubleTUW
T -> _(space)
U -> (a-z)U | (a-z)
W -> ;
}
Now there must be a way to limit the possible outcomes of this grammar by using the function L(G). Unfortunately, I couldn't find a syntax that meet my requirement to deny a second "double".
Upvotes: 0
Views: 145
Reputation: 241701
Here's a somewhat tedious regular expression to match any identifier other than double
. Converting it to a CFG can be done mechanically but it is even more tedious.
([a-ce-z]|d[a-np-z]|do[a-tv-z]|dou[ac-z]|doub[a-km-z]|doubl[a-df-z]|double[a-z])[a-z]*
Converting it to a CFG can be done mechanically but it is even more tedious:
ALPHA → a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_B → a|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_D → a|b|c|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_E → a|b|c|d|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_L → a|b|c|d|e|f|g|h|i|j|k|m|n|o|p|q|r|s|t|u|v|w|x|y|z
NOT_O → a|b|c|d|e|f|g|h|i|j|k|l|m|n|p|q|r|s|t|u|v|w|x|y|z
NOT_U → a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|v|w|x|y|z
WORD → NOT_D
| d NOT_O
| do NOT_U
| dou NOT_B
| doub NOT_L
| doubl NOT_E
| double ALPHA
| WORD ALPHA
This is why many of us usually use scanner generators like (f)lex which handle such exclusions automatically.
Upvotes: 1