Reputation: 998

Weird Common Lisp intersection behaviour

I'm trying to get the common elements of two lists. I've tried both the available intersection function and one I implemented myself, both giving the same weird result when trying to test them on lists like (a a ... a) and (a b c d ... z).

Whenever the first list contains only the same element several times and the second list begins with that element the result is the first list.

For example: (intersection '(2 2 2 2) '(2 2 2 3)) returns (2 2 2 2)

The intersection I implemented:

(defun presentp (a l)
  (cond ((null l) nil)
        ((and (atom (car l)) (equal a (car l))) t)
        ((not (atom (car l))) (presentp a (car l)))
        (t (presentp a (cdr l)))))

(defun intersectionp (a b)
  (cond ((not (and a b)) nil)
        ((presentp (car a) b) (append (list (car a)) (intersection (cdr a) b)))
        (t (intersection (cdr a) b))))

How can I get a good result on lists of that type? For example I want (2 2 2) from (intersection '(2 2 2 2) '(2 2 2 3)).

Upvotes: 2

Answers (3)

min

Reputation: 1

Here is mine that works well. I used remove to remove duplicating symbols.

(defun my-intersection (x y)
   (cond ((or (null x) (null y)) nil)
     ((find (first x) y) (cons (first x)
                   (my-intersection (remove (first x) x) y)))
     (t (my-intersection (rest x) y))))

Upvotes: 0

Joshua Taylor

Reputation: 85913

There's already an accepted answer, but I want to point out that the answer the implementation provides, where

(cl:intersection '(2 2 2 2) '(2 2 2 3))
;=> (2 2 2 2)

is correct. It's important to recognize that the intersection, nintersection, etc., are intended for use with lists that are being treated as sets. Conceptually, a set has no duplicate elements (for that you'd need a multiset), so the lists (2), (2 2), (2 2 2), etc., all represent the same set, {2}.

14.1.2.2 Lists as Sets

Lists are sometimes viewed as sets by considering their elements unordered and by assuming there is no duplication of elements.
adjoin         nset-difference    set-difference    union  
intersection   nset-exclusive-or  set-exclusive-or         
nintersection  nunion             subsetp
Figure 14-5. Some defined names related to sets.

Now, that bit about "assuming there is no duplication of elements" actually means that you probably shouldn't be using the set functions with a list like (2 2 2 2), since there's obvious duplication of elements. Even so, if you posit that lists like (2 2 2) and (2 2 2 2) represent the same set, you can see that intersection is actually giving you the correct set back. I think that the specification actually mandates that the result will have three or four elements. From the HyperSpec entry on intersection:

The intersection operation is described as follows. For all possible ordered pairs consisting of one element from list-1 and one element from list-2, :test or :test-not are used to determine whether they satisfy the test. The first argument to the :test or :test-not function is an element of list-1; the second argument is an element of list-2. If :test or :test-not is not supplied, eql is used. It is an error if :test and :test-not are supplied in the same function call. …

For every pair that satifies the test, exactly one of the two elements of the pair will be put in the result. No element from either list appears in the result that does not satisfy the test for an element from the other list. If one of the lists contains duplicate elements, there may be duplication in the result.

So, in the case of (2 2 2 2) and (2 2 2 3), there are 16 pairs to consider:

(2 2) (2 2) (2 2) (2 3) ; first element is first  2 from list-1, second elements are from list-2 
(2 2) (2 2) (2 2) (2 3) ; first element is second 2 from list-1, second elements are from list-2 
(2 2) (2 2) (2 2) (2 3) ; first element is third  2 from list-1, second elements are from list-2 
(2 2) (2 2) (2 2) (2 3) ; first element is fourth 2 from list-1, second elements are from list-2

Since "For every pair that satifies the test, exactly one of the two elements of the pair will be put in the result," it seems to me that you're going to end up with between 3 and 4 2's in the result, because you've got 12 pairs that satisfy the test, and you need to cover each row and column of those 12 pairs. This hinges, I suppose, on the interpretation of "exactly one of the two elements of the pair will be put in the result". In general though, if you have, e.g., lists-as-sets (a1 a2) and (b1 b2 b3) then you have the pairs:

(a1 b1) (a1 b2) (a1 b3)
(a2 b1) (a2 b2) (a2 b3)

I think that the spec should be read as saying that each ai and bi will be included at most once, and that you never include a given ai and bi based on the particular pair (ai bi). So, if from row one you were to select (a1 b2) and include b2 in the result, then the remaining pairs that could contribute elements to the result are

(a1 b1) (a1 b3)
(a2 b1) (a2 b3)

if you had taken a1 from (a1 b2), then the remaining pairs would be

(a2 b1) (a2 b2) (a2 b3)

That is, when you include an element from one of the pairs, you've either removed a row or a column from the table of pairs that determine the possible results. In the first case, you could still add two more elements to the result, but in the second, there could be three.

In fact, in LispWorks, if you reverse the order of the arguments, you'll get the 3 element version:

CL-USER 5 > (intersection '(2 2 2 3) '(2 2 2 2))
(2 2 2)

There is no guarantee that the order of elements in the result will reflect the ordering of the arguments in any particular way. The result list may share cells with, or be eq to, either list-1 or list-2 if appropriate.

You didn't mention whether you're just getting an equivalent list back, or if you're actually getting list-1 back. In Lispworks, it seems that you're actually getting the same list back, although that's not required:

CL-USER 2 > (let ((l1 '(2 2 2 2))
                  (l2 '(2 2 2 3)))
              (eq l1 (intersection l1 l2)))
T

Upvotes: 3

Sylwester

Reputation: 48775

You need to remove matches from the b list.. When you found an 2 in (2 2 2 3) you should continue with (2 2 3) as b.

Also.. (append (list x) result-list) is the same as (cons x result-list) just with the same or fewer CPU cylces.

(defun intersection (a b)
  (cond ((not (and a b)) nil)
        ((presentp (car a) b)
         (cons (car a) 
               (intersection (cdr a) 
                             (remove (car a) b :count 1))))
        (t (intersection (cdr a) b))))

Upvotes: 4

Weird Common Lisp intersection behaviour

Answers (3)

14.1.2.2 Lists as Sets

Related Questions