Is there an algorithm for determining if given sentence is a sub-sentence of a grammar?

Question

For example, given the grammar

Expr -> Number | Number '+' Expr
Number -> [1-9][0-9]*

we see that for + 1 there exists a sentence (e.g. 1 + 1) which is parsed by the grammar and for which + 1 is sub-sentence. Is there a general algorithm for this? I think that if we can put some kind of flags in the parser we should be able to tell it to skip some initial and some final tokens while parsing but I'm not sure whether this would work. Any ideas?

David Eisenstat · Accepted Answer

I'm sure there's a better way, but this argument shows that the class of context-free languages is closed under this operation via a linear-time transformation of a Chomsky normal form grammar.

The idea is to introduce, for each nonterminal symbol A, three other symbols Apre, Asuf, Asub, which match prefixes, suffixes, and substrings of strings matched by A.

For transitions A -> s, add

Apre ->
Apre -> s
Asuf ->
Asuf -> s
Asub ->
Asub -> s.

For transitions A -> B C, add

Apre -> Bpre
Apre -> B Cpre
Asuf -> Csuf
Asuf -> Bsuf C
Asub -> Bsub
Asub -> Csub
Asub -> Bsuf Cpre.

Change the start symbol from S to Ssub.

Is there an algorithm for determining if given sentence is a sub-sentence of a grammar?

Answers (1)

Related Questions