Reputation: 189
I need to parse the string ^borrow$ ^\$500$
into the list [borrow, $500]
. The grammar I wrote so far is
:- use_module(library(dcg/basics)).
write_list([]).
write_list([H|T]) :- atom_codes(S, H), write(S), nl, write_list(T).
% Grammar.
tags([Tag|Rest]) --> string(_), tag(Tag), tags(Rest).
tags([]) --> string(_).
tag(Tag) --> "^", tag_contents(Tag), "$".
tag_contents(Tag) --> string(Tag).
Which works when I don't have \$
inside a token:
?- phrase(tags(T), "^pisica$ ^catel$"), write_list(T).
pisica
catel
?- phrase(tags(T), "^borrow$ ^\\$500$"), write_list(T).
borrow
\
What is the best practice for parsing this kind of escaped sequences with Prolog DCGs?
Upvotes: 1
Views: 212
Reputation: 60034
the problem is that tag_contents//1 captures just the backslash, and then $ acts a tag stop in parent call.
Here is a ugly hack around this problem:
tag(Tag1) -->
"^", tag_contents(Tag), [C], "$", {C \= 0'\\, append(Tag, [C], Tag1) }.
edit
a somewhat better one:
tag(Tag) --> "^", tag_contents(Tag), "$", {\+last(Tag, 0'\\)}.
edit
'best practice' is of course to handle nested content with contextual rules. You need more code tough...
tag(Tag) --> "^", tag_contents(Tag).
tag_contents([0'\\,C|Cs]) --> "\\", [C], !, tag_contents(Cs).
tag_contents([]) --> "$".
tag_contents([C|Cs]) --> [C], tag_contents(Cs).
Upvotes: 0