parsing an expression using DCG

Question

I try to parse a list of identifiers with DCG but I fail to do it,

example

                          ::=
                            |
                          __|____
                         |       |
                        Id     Id,'(',Ids,')'.
                                       _|_
                                      |   |
                                     Id   Id,

-

ids ::= id.
ids ::= id(id1,id2,...).

example of identifier to verify

  exp1 = 'id_0'
  exp3 = 'id_1(i)'
  exp2 = 'id_2(i1,i2)'

code edit

id([Id]) --> [Id].
id([Id|Ids]) --> [Id,'('], ids(Ids), [')'].

ids([Id]) --> [Id].
ids([Id|Ids]) --> [Id], ids(Ids).

someone tell me where is the problem ?

mat · Accepted Answer

Let us first see what you are actually currently describing with (for example) id//1, using the most general query:

?- phrase(id(Is), Ls).
Is = Ls, Ls = [_G884] ;
Is = [_G884|_G885],
Ls = [_G884, '(', ids(_G885), ')'].

Trying out the most general query is often a good way to find out more about a relation. In this case, we already note two very important things:

Regarding the list Ls of tokens, the first solution is clearly more general than intended. No clause that you add to this program (while preserving logical-purity) can remove this flaw. Therefore, you need to make the first rule more specific.
The tokens described by the second answer do not actually match the grammar, because they contain the concrete token ids(_). This points you to the fact that you are intermingling DCG rule names with the tokens you want to describe.

For a start, let us first focus an describing the list of tokens that we want to describe. This simplifies the code somewhat, since we do not have to deal with so many things at once.

I suggest you start with the following:

ids --> [id].
ids --> [id,'('], ids_, [')'].

ids_ --> ids, more_ids_.

more_ids_ --> [].
more_ids_ --> [,], ids_.

Notice the helpful pattern that allows us to specify "more of the same", which is either nothing or a comma and then "the same".

I suppose you will eventually make this more general, and for example accept also other identifiers besides the concrete token id. The place to do this is clear from the code above, and I leave it as an easy exercise.

Notice that we can already use this to see whether we are describing only intended lists of tokens:

?- length(Ls, _), phrase(ids, Ls).
Ls = [id] ;
Ls = [id, '(', id, ')'] ;
Ls = [id, '(', id, ',', id, ')'] ;
Ls = [id, '(', id, '(', id, ')', ')'] ;
etc.

I am using length/2 for fair enumeration.

Once you have made sure that you are describing all and only lists of tokens you intended, you can then start to relate such lists of tokens to other Prolog terms. This will often involve (=..)/2 and other relations between terms.

parsing an expression using DCG

Answers (2)

Related Questions