Reputation: 5385

Intersection of infinite lists

I know from computability theory that it is possible to take the intersection of two infinite lists, but I can't find a way to express it in Haskell.

The traditional method fails as soon as the second list is infinite, because you spend all your time checking it for a non-matching element in the first list.

Example:

let ones = 1 : ones -- an unending list of 1s
intersect [0,1] ones

This never yields 1, as it never stops checking ones for the element 0.

A successful method needs to ensure that each element of each list will be visited in finite time.

Probably, this will be by iterating through both lists, and spending approximately equal time checking all previously-visited elements in each list against each other.

If possible, I'd like to also have a way to ignore duplicates in the lists, as it is occasionally necessary, but this is not a requirement.

Upvotes: 9

Answers (7)

jpmarinier

Reputation: 4733

You can achieve this with:

one function to generate all possible (x,y) pairs for comparison of x and y
some post-processing function (ppf) to extract intersection elements.

The first one is of course the trickiest to write. A pedestrian, Prelude-only solution can be (using something useful for polynomial multiplication):


layeredPairLists :: [a] -> [b] -> [[(a,b)]]
layeredPairLists xs ys = takeWhile (not . null) $ map extract states  where
    extract (us,vs,rvs) = zip us rvs
    states = drop 1 $ iterate sfn (xs,ys,[])
    sfn (us,vs,rvs) = if (null vs)  then  (drop 1 us,[],rvs)
                                    else  (us, tail vs, (head vs):rvs)

intersect :: Eq a => [a] -> [a] -> [a]
intersect xs ys = concatMap ppf (layeredPairLists xs ys)  where
    ppf = (map fst) . (filter (\(x,y) -> x==y))

The first function generates pairs in increasing order of the sum of indexes:

$ ghci
GHCi, version 9.4.5: https://www.haskell.org/ghc/  :? for help
ghci> 
ghci> :load q42300273.hs
[1 of 2] Compiling Main             ( q42300273.hs, interpreted )
Ok, one module loaded.
ghci> 
ghci> printAsLines = mapM_ (putStrLn . show)
ghci> 
ghci> printAsLines $ take 7 $ layeredPairLists [0..] [0..]
[(0,0)]
[(0,1),(1,0)]
[(0,2),(1,1),(2,0)]
[(0,3),(1,2),(2,1),(3,0)]
[(0,4),(1,3),(2,2),(3,1),(4,0)]
[(0,5),(1,4),(2,3),(3,2),(4,1),(5,0)]
[(0,6),(1,5),(2,4),(3,3),(4,2),(5,1),(6,0)]
ghci>

Testing:

ghci> 
ghci> m3s  =  map (3*)  [0..]
ghci> m10s =  map (10*) [0..]
ghci> 
ghci> take 10 m3s
[0,3,6,9,12,15,18,21,24,27]
ghci> 
ghci> take 10 m10s
[0,10,20,30,40,50,60,70,80,90]
ghci> 
ghci> take 21 $ intersect m3s m10s
[0,30,60,90,120,150,180,210,240,270,300,330,360,390,420,450,480,510,540,570,600]
ghci>

Upvotes: 0

Daniel Wagner

Reputation: 152837

Using the universe package's Cartesian product operator we can write this one-liner:

import Data.Universe.Helpers

isect :: Eq a => [a] -> [a] -> [a]
xs `isect` ys = [x | (x, y) <- xs +*+ ys, x == y]
-- or this, which may do marginally less allocation
xs `isect` ys = foldr ($) [] $ cartesianProduct 
    (\x y -> if x == y then (x:) else id)
    xs ys

Try it in ghci:

> take 10 $ [0,2..] `isect` [0,3..]
[0,6,12,18,24,30,36,42,48,54]

This implementation will not produce any duplicates if the input lists don't have any; but if they do, you can tack on your favorite dup-remover either before or after calling isect. For example, with nub, you might write

> nub ([0,1] `isect` repeat 1)
[1

and then heat up your computer pretty good, since it can never be sure there might not be a 0 in that second list somewhere if it looks deep enough.

This approach is significantly faster than David Fletcher's, produces many fewer duplicates and produces new values much more quickly than Willem Van Onsem's, and doesn't assume the lists are sorted like freestyle's (but is consequently much slower on such lists than freestyle's).

Upvotes: 12

willeM_ Van Onsem

Reputation: 476659

An idea might be to use incrementing bounds. Let is first relax the problem a bit: yielding duplicated values is allowed. In that case you could use:

import Data.List (intersect)

intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
    where intersectInfinite' n = intersect (take n xs) (take n ys) ++ intersectInfinite' (n+1)

In other words we claim that:

A∩B = A₁∩B₁ ∪ A₂∩B₂ ∪ ... ∪ ...

with A₁ is a set containing the first i elements of A (yes there is no order in a set, but let's say there is somehow an order). If the set contains less elements then the full set is returned.

If c is in A (at index i) and in B (at index j), c will be emitted in segment (not index) max(i,j).

This will thus always generate an infinite list (with an infinite amount of duplicates) regardless whether the given lists are finite or not. The only exception is when you give it an empty list, in which case it will take forever. Nevertheless we here ensured that every element in the intersection will be emitted at least once.

Making the result finite (if the given lists are finite)

Now we can make our definition better. First we make a more advanced version of take, takeFinite (let's first give a straight-forward, but not very efficient defintion):

takeFinite :: Int -> [a] -> (Bool,[a])
takeFinite _ [] = (True,[])
takeFinite 0 _  = (False,[])
takeFinite n (x:xs) = let (b,t) = takeFinite (n-1) xs in (b,x:t)

Now we can iteratively deepen until both lists have reached the end:

intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1

intersectInfinite' :: Eq a => Int -> [a] -> [a] -> [a]
intersectInfinite' n xs ys | fa && fb = intersect xs ys
                           | fa = intersect ys xs
                           | fb = intersect xs ys
                           | otherwise = intersect xfa xfb ++ intersectInfinite' (n+1) xs ys
    where (fa,xfa) = takeFinite n xs
          (fb,xfb) = takeFinite n ys

This will now terminate given both lists are finite, but still produces a lot of duplicates. There are definitely ways to resolve this issue more.

Upvotes: 6

Zoey Hewll

Reputation: 5385

Solution

I ended up using the following implementation; a slight modification of the answer by David Fletcher:

isect :: Eq a => [a] -> [a] -> [a]
isect [] = const [] -- don't bother testing against an empty list
isect xs = catMaybes . diagonal . map matches
    where matches y = [if x == y then Just x else Nothing | x <- xs]

This can be augmented with nub to filter out duplicates:

isectUniq :: Eq a => [a] -> [a] -> [a]
isectUniq xs = nub . isect xs

Explanation

Of the line isect xs = catMaybes . diagonal . map matches

(map matches) ys computes a list of lists of comparisons between elements of xs and ys, where the list indices specify the indices in ys and xs respectively: i.e (map matches) ys !! 3 !! 0 would represent the comparison of ys !! 3 with xs !! 0, which would be Nothing if those values differ. If those values are the same, it would be Just that value.

diagonals takes a list of lists and returns a list of lists where the nth output list contains an element each from the first n lists. Another way to conceptualise it is that (diagonals . map matches) ys !! n contains comparisons between elements whose indices in xs and ys sum to n.
diagonal is simply a flat version of diagonals (diagonal = concat diagonals)

Therefore (diagonal . map matches) ys is a list of comparisons between elements of xs and ys, where the elements are approximately sorted by the sum of the indices of the elements of ys and xs being compared; this means that early elements are compared to later elements with the same priority as middle elements being compared to each other.

(catMaybes . diagonal . map matches) ys is a list of only the elements which are in both lists, where the elements are approximately sorted by the sum of the indices of the two elements being compared.

Note
(diagonal . map (catMaybes . matches)) ys does not work: catMaybes . matches only yields when it finds a match, instead of also yielding Nothing on no match, so the interleaving does nothing to distribute the work.

To contrast, in the chosen solution, the interleaving of Nothing and Just values by diagonal means that the program divides its attention between 'searching' for multiple different elements, not waiting for one to succeed; whereas if the Nothing values are removed before interleaving, the program may spend too much time waiting for a fruitless 'search' for a given element to succeed.

Therefore, we would encounter the same problem as in the original question: while one element does not match any elements in the other list, the program will hang; whereas the chosen solution will only hang while no matches are found for any elements in either list.

Upvotes: 0

chi

Reputation: 116139

Here's yet another alternative, leveraging Control.Monad.WeightedSearch

import Control.Monad (guard)
import Control.Applicative
import qualified Control.Monad.WeightedSearch as W

We first define a cost for digging inside the list. Accessing the tail costs 1 unit more. This will ensure a fair scheduling among the two infinite lists.

eachW :: [a] -> W.T Int a
eachW = foldr (\x w -> pure x <|> W.weight 1 w) empty

Then, we simply disregard infinite lists.

intersection :: [Int] -> [Int] -> [Int]
intersection xs ys = W.toList $ do
   x <- eachW xs
   y <- eachW ys
   guard (x==y)
   return y

Even better with MonadComprehensions on:

intersection2 :: [Int] -> [Int] -> [Int]
intersection2 xs ys = W.toList [ y | x <- eachW xs, y <- eachW ys, x==y ]

Upvotes: 2

freestyle

Reputation: 3790

If elements in the lists are ordered then you can easy to do that.

intersectOrd :: Ord a => [a] -> [a] -> [a]
intersectOrd [] _ = []
intersectOrd _ [] = []
intersectOrd (x:xs) (y:ys) = case x `compare` y of
    EQ -> x : intersectOrd xs ys
    LT -> intersectOrd xs (y:ys)
    GT -> intersectOrd (x:xs) ys

Upvotes: 5

David Fletcher

Reputation: 2818

Here's one way. For each x we make a list of maybes which has Just x only where x appeared in ys. Then we interleave all these lists.

isect :: Eq a => [a] -> [a] -> [a]
isect xs ys = (catMaybes . foldr interleave [] . map matches) xs
  where
    matches x = [if x == y then Just x else Nothing | y <- ys]

interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs

Maybe it can be improved using some sort of fairer interleaving - it's already pretty slow on the example below because (I think) it's doing an exponential amount of work.

> take 10 (isect [0..] [0,2..])
[0,2,4,6,8,10,12,14,16,18]

Upvotes: 5

Intersection of infinite lists

Answers (7)

Solution

Explanation

Related Questions