Reputation: 19
I am trying to convert a CFG to a CNF, but I am unsure what to identify as 'variables'. Here is the problem:
S -> aA | ABa
A -> AA | a
B -> AbA | bb
I have added a new start variable to make it
S' -> S
S -> aA | ABa
A -> AA | a
B -> AbA | bb
Then, after unit production removal, it is:
S' -> aA | ABa
S -> aA | ABa
A -> AA | a
B -> AbA | bb
I know the next step would be to change any production that have more than 2 variables, but is ABa three variables? Or is it two variables and a terminal?
If it is two variables and a terminal, to finally simplify it, am I able to create something like this:
S' -> aA | Xa
S -> aA | Xa
A -> AA | a
B -> Yb | bb
X -> AB
Y -> AA
Thanks!
Upvotes: 0
Views: 4352
Reputation: 1579
The grammars in the Chomsky normal form has the following production formats:
A → BC, or
A → a, or
S → ε,
It is made of symbols (a
, A
...). These symbols are in two sets: terminal symbols (as a
and b
, lower case letters, both part of the alphabet) and non terminal symbols (as A
and B
, upper case letters). The grammar also has productions (like S → a
), and a starting symbol (a non-terminal) S
.
You start from this grammar (there is no need of new starting rule):
S → aA | ABa
A → AA | a
B → AbA | bb
Separate the concatenations into their own productions:
S → aA
S → ABa
A → AA
A → a
B → AbA
B → bb
Move all terminal symbols into their own non-terminal symbols (a
into C
(but keep the a
in A
, because its already valid in the CNF) and b
into D
):
S → CA
S → ABC
A → AA
A → a
B → ADA
B → DD
C → a
D → b
Split the the right hand sides where there are more than two non-terminal symbols:
S → CA
S → AE
A → AA
A → a
B → AF
B → DD
C → a
D → b
E → BC
F → DA
Then you are done.
Upvotes: 0