Prolog DCG for parsing escaped sequences

Question

I need to parse the string ^borrow$ ^\$500$ into the list [borrow, $500]. The grammar I wrote so far is

:- use_module(library(dcg/basics)).

write_list([]).
write_list([H|T]) :- atom_codes(S, H), write(S), nl, write_list(T).

% Grammar.
tags([Tag|Rest]) --> string(_), tag(Tag), tags(Rest).
tags([]) --> string(_).
tag(Tag) --> "^", tag_contents(Tag), "$".
tag_contents(Tag) --> string(Tag).

Which works when I don't have \$ inside a token:

?- phrase(tags(T), "^pisica$ ^catel$"), write_list(T).
pisica
catel
?- phrase(tags(T), "^borrow$ ^\$500$"), write_list(T).
borrow
\

What is the best practice for parsing this kind of escaped sequences with Prolog DCGs?

CapelliC · Accepted Answer

the problem is that tag_contents//1 captures just the backslash, and then $ acts a tag stop in parent call.

Here is a ugly hack around this problem:

tag(Tag1) -->
   "^", tag_contents(Tag), [C], "$", {C \= 0'\, append(Tag, [C], Tag1) }.

edit

a somewhat better one:

tag(Tag) --> "^", tag_contents(Tag), "$", {\+last(Tag, 0'\)}.

edit

'best practice' is of course to handle nested content with contextual rules. You need more code tough...

tag(Tag) --> "^", tag_contents(Tag).

tag_contents([0'\,C|Cs]) --> "\", [C], !, tag_contents(Cs).
tag_contents([]) --> "$".
tag_contents([C|Cs]) --> [C], tag_contents(Cs).

Prolog DCG for parsing escaped sequences

Answers (1)

Related Questions