madeofmistake
madeofmistake

Reputation: 528

Finding subsequence pairs

The intention of the functions below is to find matching pairs of length 2 subsequences. E.g. [1,2,3,4,2,3,5] has a matching pair of [2,3]. If I feed a sequence w/out a matching pair the first function gets a memory overflow but the second finishes quickly.

Why the difference and is there a general technique for how to dig into this sort of issue?

repeat_pair(Items) :-
    append(_,[X,Y|_],Left),
    append(Left,[X,Y|_],Items).

repeat_pair(Items) :-
    append(Left,[X,Y|_],Items),
    append(_,[X,Y|_],Left).

Upvotes: 4

Views: 154

Answers (3)

false
false

Reputation: 10102

Why the difference ... ?

It suffices to consider the following of your first version:

repeat_pair(Items) :-
    append(_,[X,Y|_],Left), false,
    append(Left,[X,Y|_],Items).

This fragment shares some properties with the original program: In particular, if this fragment loops, then your original program will loop, too. Now, what might influence termination? Only the variable Items might influence it, but all occurrences are behind the false. So there is no way how Items influences termination and the goal append(_,[X,Y|_],Left) does not terminate. Thus, even if this program succeeds, it will not terminate. Think of repeat_pair("abab").

... and is there a general technique for how to dig into this sort of issue?

Yes, there is. Question answered.

(It's above technique called . See the tag for more examples.)

Finally, here is a version using s:

repeat_pair(Cs) :-
   phrase(( ..., [A,B], ..., [A,B], ... ), Cs).

... --> [] | [_], ...

A generalization which also accepts non-lists like [a,b,a,b|non_list] is below. Often, such generalizations are preferred, as they are slightly shorter and slightly more efficient and may possess slightly better termination properties (not in this case).

repeat_pair(Cs) :-
   phrase(( ..., [A,B], ..., [A,B] ), Cs, _).

Upvotes: 2

CapelliC
CapelliC

Reputation: 60004

Prolog operational semantic is based on chronologic backtracking, and this means that goals order is important for correctness, as it provides instantiation of variables in a predictable way. That said, you could use append/2 instead of append/3 to achieve the same result:

repeat_pair(Items) :- append([_,[X,Y],_,[X,Y],_],Items).

effectively moving into a library predicate the logic.

Upvotes: 2

Isabelle Newbie
Isabelle Newbie

Reputation: 9378

As you observe in your comment, in the first variant there is no relation between Items and the goal append(_, [X,Y|_], Left), so on backtracking this goal will enumerate all its infinitely many solutions.

Your second approach is the best one for this problem, but indeed in general you sometimes need to restrict the search space in other ways.

The problem with the first clause is that Left is completely unconstrained. But we know that we never want it to be longer than Items. So we can write a predicate to express "Items is a list, and Left is a shorter or equal length list":

list_shorterorequal(_List, []).
list_shorterorequal([_|List], [_|ShorterOrEqual]) :-
    list_shorterorequal(List, ShorterOrEqual).

?- list_shorterorequal([a, b, c], ShorterOrEqual).
ShorterOrEqual = [] ;
ShorterOrEqual = [_G938] ;
ShorterOrEqual = [_G938, _G941] ;
ShorterOrEqual = [_G938, _G941, _G944].

And then you can adapt your first implementation like this:

repeat_pair_1x(Items) :-
    list_shorterorequal(Items, Left),
    append(_,[X,Y|_],Left),
    append(Left,[X,Y|_],Items).

And this succeeds once, then terminates quickly:

?- repeat_pair_1x([1,2,3,4,2,3,5]).
true ;
false.

If your Prolog has a between/3 predicate (they usually do), you could also use that to restrict the length of Left before we know anything else about Left itself:

repeat_pair_2x(Items) :-
    length(Items, ItemsLength),
    between(2, ItemsLength, LeftLength),
    length(Left, LeftLength),
    append(_,[X,Y|_],Left),
    append(Left,[X,Y|_],Items).

This also terminates nicely:

?- repeat_pair_2x([1,2,3,4,2,3,5]).
true ;
false.

Note that these two predicates behave slightly differently on the most general query, and in general if the argument of the call is not bound to a finite list:

?- repeat_pair_1x(Items).
Items = [_G918, _G924, _G918, _G924|_G940] ;
Items = [_G918, _G924, _G930, _G918, _G924|_G946] ;
Items = [_G918, _G924, _G930, _G924, _G930|_G949] ;
Items = [_G918, _G924, _G930, _G936, _G918, _G924|_G952] ;
Items = [_G918, _G924, _G930, _G936, _G924, _G930|_G955] .

?- repeat_pair_2x(Items).
Items = [_G918, _G921, _G918, _G921] ;
Items = [_G918, _G921, _G918, _G921, _G930] ;
Items = [_G918, _G921, _G924, _G918, _G921] ;
Items = [_G918, _G921, _G924, _G921, _G924] ;
Items = [_G918, _G921, _G918, _G921, _G930, _G933] .

By calling length/2 on Items in the second variant, we force it to be a finite list. There might be cases where you don't want this.

Upvotes: 2

Related Questions