Andrei Sfrent
Andrei Sfrent

Reputation: 189

Prolog DCG for parsing escaped sequences

I need to parse the string ^borrow$ ^\$500$ into the list [borrow, $500]. The grammar I wrote so far is

:- use_module(library(dcg/basics)).

write_list([]).
write_list([H|T]) :- atom_codes(S, H), write(S), nl, write_list(T).

% Grammar.
tags([Tag|Rest]) --> string(_), tag(Tag), tags(Rest).
tags([]) --> string(_).
tag(Tag) --> "^", tag_contents(Tag), "$".
tag_contents(Tag) --> string(Tag).

Which works when I don't have \$ inside a token:

?- phrase(tags(T), "^pisica$ ^catel$"), write_list(T).
pisica
catel
?- phrase(tags(T), "^borrow$ ^\\$500$"), write_list(T).
borrow
\

What is the best practice for parsing this kind of escaped sequences with Prolog DCGs?

Upvotes: 1

Views: 212

Answers (1)

CapelliC
CapelliC

Reputation: 60034

the problem is that tag_contents//1 captures just the backslash, and then $ acts a tag stop in parent call.

Here is a ugly hack around this problem:

tag(Tag1) -->
   "^", tag_contents(Tag), [C], "$", {C \= 0'\\, append(Tag, [C], Tag1) }.

edit

a somewhat better one:

tag(Tag) --> "^", tag_contents(Tag), "$", {\+last(Tag, 0'\\)}.

edit

'best practice' is of course to handle nested content with contextual rules. You need more code tough...

tag(Tag) --> "^", tag_contents(Tag).

tag_contents([0'\\,C|Cs]) --> "\\", [C], !, tag_contents(Cs).
tag_contents([]) --> "$".
tag_contents([C|Cs]) --> [C], tag_contents(Cs).

Upvotes: 0

Related Questions