Reputation:

Include non-unique elements only

This question was asked but there are no answers: here. I read the comments and tried to implement in both ways, but there are more problems that I don't understand.

I first tried the easy way that doesn't keep original order:

list_repeated(L, Ds) :-
    msort(L, S),
    sorted_repeated(S, Ds).

sorted_repeated([], []).
sorted_repeated([X|Xs], Ds) :-
    first(Xs, X, Ds).

first([], _, []).
first([X|Xs], X, [X|Ds]) :-
    more(Xs, X, Ds).
first([X|Xs], Y, Ds) :-
    dif(X, Y),
    first(Xs, X, Ds).

more([], _, []).
more([X|Xs], X, Ds) :-
    more(Xs, X, Ds).
more([X|Xs], Y, Ds) :-
    dif(X, Y),
    first(Xs, X, Ds).

Once the list is sorted without removing duplicates, using first and more I add the element to the second argument if it occurs at least twice and skip all consecutive copies of the element.

This is not working properly because if I have:

?- list_duplicates([b,a,a,a,b,b], Ds).

I get answer [a,b] instead of [b,a] and also I get ; false after the answer.

I also tried another way, but this doesn't work because the accumulator is immutable?

list_duplicates(L, Ds) :-
    ld_acc(L, [], Ds).

ld_acc([], _, []).
ld_acc([X|Xs], Acc, Ds) :-
    (   memberchk(X, Acc)
    ->  Ds = [X|Ds0],
        ld_acc(Xs, Acc, Ds0)
    ;   Acc1 = [X|Acc],
        ld_acc(Xs, Acc1, Ds)
    ).

This cannot work because when I check that an element is member of accumulator I remove only one occurrence of each element: if I have three times the same element in the first argument, I am left with two. If I could change the element in the accumulator then I could maybe put a counter on it? In the first version I used different states, first and more, but here I have to attach state to the elements of the accumulator, is that possible?

Upvotes: 3

Answers (2)

mat

Reputation: 40768

A plea for purity

When programming in Prolog, a major attraction is the generality we enjoy from pure relations.

This lets us use our code in multiple directions, and reason declaratively over our programs and answers.

You can enjoy these benefits if you keep your programs pure.

Possible solution

As always when describing lists, also consider using DCG notation. See dcg for more information.

For example, to describe the list of duplicates in a pure way, consider:

list_duplicates([]) --> [].
list_duplicates([L|Ls]) -->
        list_duplicates_(Ls, L),
        list_duplicates(Ls).

list_duplicates_([], _) --> [].
list_duplicates_([L0|Ls], L) -->
        if_(L0=L, [L], []),
        list_duplicates_(Ls, L).

This uses if_//3 to retain generality and determinism (if applicable).

Examples

Here are a few example queries and answers. We start with simple ground cases:

?- phrase(list_duplicates([a,b,c]), Ds).
Ds = [].

?- phrase(list_duplicates([a,b,a]), Ds).
Ds = [a].

Even the most impure version will be able to handle these situations correctly. So, slightly more interesting:

?- phrase(list_duplicates([a,b,X]), Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).

Pretty nice, isn't it? The last part says: Ds = [] is a solution if X is different from b and a. Note the pure relation dif/2 automatically appears in these residual goals and retains the relation's generality.

Here is an example with two variables:

?- phrase(list_duplicates([X,Y]), Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).

Finally, consider using iterative deepening to fairly enumerate answers for lists of arbitrary length:

?- length(Ls, _), phrase(list_duplicates(Ls), Ds).
Ls = Ds, Ds = [] ;
Ls = [_136],
Ds = [] ;
Ls = [_136, _136],
Ds = [_136] ;
Ls = [_156, _162],
Ds = [],
dif(_162, _156) ;
Ls = Ds, Ds = [_42, _42, _42] ;
Ls = [_174, _174, _186],
Ds = [_174],
dif(_186, _174) .

Multiple occurrences

Here is a version that handles arbitrary many occurrences of the same element in such a way that exactly a single occurrence is retained if (and only if) the element occurs at least twice:

list_duplicates(Ls, Ds) :-
        phrase(list_duplicates(Ls, []), Ds).

list_duplicates([], _) --> [].
list_duplicates([L|Ls], Ds0) -->
        list_duplicates_(Ls, L, Ds0, Ds),
        list_duplicates(Ls, Ds).

list_duplicates_([], _, Ds, Ds) --> [].
list_duplicates_([L0|Ls], L, Ds0, Ds) -->
        if_(L0=L, new_duplicate(L0, Ds0, Ds1), {Ds0 = Ds1}),
        list_duplicates_(Ls, L, Ds1, Ds).

new_duplicate(E, Ds0, Ds) -->
        new_duplicate_(Ds0, E, Ds0, Ds).

new_duplicate_([], E, Ds0, [E|Ds0]) --> [E].
new_duplicate_([L|Ls], E, Ds0, Ds)  -->
        if_(L=E,
            { Ds0 = Ds },
            new_duplicate_(Ls, E, Ds0, Ds)).

The query shown by @fatalize in the comments now yields:

?- list_duplicates([a,a,a], Ls).
Ls = [a].

The other examples yield the same results. For instance:

?- list_duplicates([a,b,c], Ds).
Ds = [].

?- list_duplicates([a,b,a], Ds).
Ds = [a].

?- list_duplicates([a,b,X], Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).

?- list_duplicates([X,Y], Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).

I leave the case ?- list_duplicates(Ls, Ls). as an exercise.

Generality: Multiple directions

Ideally, we want to be able to use a relation in all directions.

For example, our program should be able to answer questions like:

What does a list look like if its duplicates are [a,b]?

With the version shown above, we get:

?- list_duplicates(Ls, [a,b]).
nontermination

Luckily, a very simple change allows as to answer such questions!

One such change is to simply write:

list_duplicates(Ls, Ds) :-
        length(Ls, _),
        phrase(list_duplicates(Ls, []), Ds).

This is obviously declaratively admissible, because Ls must be a list. Operationally, this helps us to enumerate lists in a fair way.

We now get:

?- list_duplicates(Ls, [a,b]).
Ls = [a, a, b, b] ;
Ls = [a, b, a, b] ;
Ls = [a, b, b, a] ;
Ls = [a, a, a, b, b] ;
Ls = [a, a, b, a, b] ;
Ls = [a, a, b, b, a] ;
Ls = [a, a, b, b, b] ;
Ls = [a, a, b, b, _4632],
dif(_4632, b),
dif(_4632, a) ;
etc.

Here is a simpler case, using only a single element:

?- list_duplicates(Ls, [a]).
Ls = [a, a] ;
Ls = [a, a, a] ;
Ls = [a, a, _3818],
dif(_3818, a) ;
Ls = [a, _3870, a],
dif(_3870, a) ;
Ls = [_4058, a, a],
dif(a, _4058),
dif(a, _4058) ;
Ls = [a, a, a, a] ;
etc.

Maybe even more interesting:

What does a list without duplicates look like?

Our program answers:

?- list_duplicates(Ls, []).
Ls = [] ;
Ls = [_3240] ;
Ls = [_3758, _3764],
dif(_3764, _3758) ;
Ls = [_4164, _4170, _4176],
dif(_4176, _4164),
dif(_4176, _4170),
dif(_4170, _4164) .

Thus, the special case of a list where all elements are distinct naturally exists as a special case of the more general relation we have implemented.

We can use this relation to generate answers (as shown above), and also to test whether a list consists of distinct elements:

?- list_duplicates([a,b,c], []).
true.

?- list_duplicates([b,b], []).
false.

Unfortunately, the following even more general query still yields:

?- list_duplicates([b,b|_], []).
nontermination

On the plus side, if the length of the list is fixed, we get in such cases:

?- length(Ls, L), maplist(=(b), Ls),
   ( true ; list_duplicates(Ls, []) ).
Ls = [],
L = 0 ;
Ls = [],
L = 0 ;
Ls = [b],
L = 1 ;
Ls = [b],
L = 1 ;
Ls = [b, b],
L = 2 ;
Ls = [b, b, b],
L = 3 ;
Ls = [b, b, b, b],
L = 4 .

This is some indication that the program indeed terminates in such cases. Note that the answers are of course now too general.

Efficiency

It is well known in high-performance computing circles that as long as your program is fast enough, its correctness is barely worth considering.

So, the key question is of course: How can we make this faster?

I leave this is a very easy exercise. One way to make this faster in specific cases is to first check whether the given list is sufficiently instantiated. In that case, you can apply an ad hoc solution that fails terribly in more general cases, but has the extreme benefit that it is fast!

Upvotes: 2

Jim Ashworth

Reputation: 765

So as far as I can tell, you were on the right track with the accumulator, but this implementation definitely works as you want (assuming you want the duplicates in the order they first appear in the list).

list_duplicates(Input,Output) is just used to wrap and initialise the accumulator.

list_duplicates(Accumulator,[],Accumulator) unifies the accumulator with the output when we have finished processing the input list.

list_duplicates(Accumulator,[H|T],Output) says "if the head (H) of the input list is in the rest of the list (T), and is not in the Accumulator already, put it at the end of the Accumulator (using append), then recurse on the tail of the list".

list_duplicates(Accumulator,[_|T],Output) (which we only get to if either the head is not a duplicate, or is already in the Accumulator) just recurses on the tail of the list.

list_duplicates(Input,Output) :-
    once(list_duplicates([],Input,Output)).

list_duplicates(Accumulator,[],Accumulator).
list_duplicates(Accumulator,[H|T],Output) :-
    member(H,T),
    \+member(H,Accumulator),
    append(Accumulator,[H],NewAccumulator),
    list_duplicates(NewAccumulator,T,Output).
list_duplicates(Accumulator,[_|T],Output) :-
    list_duplicates(Accumulator,T,Output).

You could also recurse in list_duplicates(Accumulator,[H|T],Output) with list_duplicates([H|Accumulator],T,Output) and reverse in the wrapper, looking like this:

list_duplicates(Input,Output) :-
    once(list_duplicates([],Input,ReverseOutput)),
    reverse(ReverseOutput,Output).

list_duplicates(Accumulator,[],Accumulator).
list_duplicates(Accumulator,[H|T],Output) :-
    member(H,T),
    \+member(H,Accumulator),
    list_duplicates([H|Accumulator],T,Output).
list_duplicates(Accumulator,[_|T],Output) :-
    list_duplicates(Accumulator,T,Output).

The once call in the wrapper prevents the false output (or in this case, partial duplicate lists due to a lack of guards on the second rule).

Upvotes: 0