Reputation: 5385
I know from computability theory that it is possible to take the intersection of two infinite lists, but I can't find a way to express it in Haskell.
The traditional method fails as soon as the second list is infinite, because you spend all your time checking it for a non-matching element in the first list.
Example:
let ones = 1 : ones -- an unending list of 1s
intersect [0,1] ones
This never yields 1
, as it never stops checking ones
for the element 0
.
A successful method needs to ensure that each element of each list will be visited in finite time.
Probably, this will be by iterating through both lists, and spending approximately equal time checking all previously-visited elements in each list against each other.
If possible, I'd like to also have a way to ignore duplicates in the lists, as it is occasionally necessary, but this is not a requirement.
Upvotes: 9
Views: 1045
Reputation: 4733
You can achieve this with:
ppf
) to extract intersection elements.The first one is of course the trickiest to write. A pedestrian, Prelude-only solution can be (using something useful for polynomial multiplication):
layeredPairLists :: [a] -> [b] -> [[(a,b)]]
layeredPairLists xs ys = takeWhile (not . null) $ map extract states where
extract (us,vs,rvs) = zip us rvs
states = drop 1 $ iterate sfn (xs,ys,[])
sfn (us,vs,rvs) = if (null vs) then (drop 1 us,[],rvs)
else (us, tail vs, (head vs):rvs)
intersect :: Eq a => [a] -> [a] -> [a]
intersect xs ys = concatMap ppf (layeredPairLists xs ys) where
ppf = (map fst) . (filter (\(x,y) -> x==y))
The first function generates pairs in increasing order of the sum of indexes:
$ ghci
GHCi, version 9.4.5: https://www.haskell.org/ghc/ :? for help
ghci>
ghci> :load q42300273.hs
[1 of 2] Compiling Main ( q42300273.hs, interpreted )
Ok, one module loaded.
ghci>
ghci> printAsLines = mapM_ (putStrLn . show)
ghci>
ghci> printAsLines $ take 7 $ layeredPairLists [0..] [0..]
[(0,0)]
[(0,1),(1,0)]
[(0,2),(1,1),(2,0)]
[(0,3),(1,2),(2,1),(3,0)]
[(0,4),(1,3),(2,2),(3,1),(4,0)]
[(0,5),(1,4),(2,3),(3,2),(4,1),(5,0)]
[(0,6),(1,5),(2,4),(3,3),(4,2),(5,1),(6,0)]
ghci>
Testing:
ghci>
ghci> m3s = map (3*) [0..]
ghci> m10s = map (10*) [0..]
ghci>
ghci> take 10 m3s
[0,3,6,9,12,15,18,21,24,27]
ghci>
ghci> take 10 m10s
[0,10,20,30,40,50,60,70,80,90]
ghci>
ghci> take 21 $ intersect m3s m10s
[0,30,60,90,120,150,180,210,240,270,300,330,360,390,420,450,480,510,540,570,600]
ghci>
Upvotes: 0
Reputation: 152837
Using the universe package's Cartesian product operator we can write this one-liner:
import Data.Universe.Helpers
isect :: Eq a => [a] -> [a] -> [a]
xs `isect` ys = [x | (x, y) <- xs +*+ ys, x == y]
-- or this, which may do marginally less allocation
xs `isect` ys = foldr ($) [] $ cartesianProduct
(\x y -> if x == y then (x:) else id)
xs ys
Try it in ghci:
> take 10 $ [0,2..] `isect` [0,3..]
[0,6,12,18,24,30,36,42,48,54]
This implementation will not produce any duplicates if the input lists don't have any; but if they do, you can tack on your favorite dup-remover either before or after calling isect
. For example, with nub
, you might write
> nub ([0,1] `isect` repeat 1)
[1
and then heat up your computer pretty good, since it can never be sure there might not be a 0
in that second list somewhere if it looks deep enough.
This approach is significantly faster than David Fletcher's, produces many fewer duplicates and produces new values much more quickly than Willem Van Onsem's, and doesn't assume the lists are sorted like freestyle's (but is consequently much slower on such lists than freestyle's).
Upvotes: 12
Reputation: 476659
An idea might be to use incrementing bounds. Let is first relax the problem a bit: yielding duplicated values is allowed. In that case you could use:
import Data.List (intersect)
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
where intersectInfinite' n = intersect (take n xs) (take n ys) ++ intersectInfinite' (n+1)
In other words we claim that:
A∩B = A1∩B1 ∪ A2∩B2 ∪ ... ∪ ...
with A1 is a set containing the first i elements of A (yes there is no order in a set, but let's say there is somehow an order). If the set contains less elements then the full set is returned.
If c is in A (at index i) and in B (at index j), c will be emitted in segment (not index) max(i,j).
This will thus always generate an infinite list (with an infinite amount of duplicates) regardless whether the given lists are finite or not. The only exception is when you give it an empty list, in which case it will take forever. Nevertheless we here ensured that every element in the intersection will be emitted at least once.
Making the result finite (if the given lists are finite)
Now we can make our definition better. First we make a more advanced version of take
, takeFinite
(let's first give a straight-forward, but not very efficient defintion):
takeFinite :: Int -> [a] -> (Bool,[a])
takeFinite _ [] = (True,[])
takeFinite 0 _ = (False,[])
takeFinite n (x:xs) = let (b,t) = takeFinite (n-1) xs in (b,x:t)
Now we can iteratively deepen until both lists have reached the end:
intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
intersectInfinite' :: Eq a => Int -> [a] -> [a] -> [a]
intersectInfinite' n xs ys | fa && fb = intersect xs ys
| fa = intersect ys xs
| fb = intersect xs ys
| otherwise = intersect xfa xfb ++ intersectInfinite' (n+1) xs ys
where (fa,xfa) = takeFinite n xs
(fb,xfb) = takeFinite n ys
This will now terminate given both lists are finite, but still produces a lot of duplicates. There are definitely ways to resolve this issue more.
Upvotes: 6
Reputation: 5385
I ended up using the following implementation; a slight modification of the answer by David Fletcher:
isect :: Eq a => [a] -> [a] -> [a]
isect [] = const [] -- don't bother testing against an empty list
isect xs = catMaybes . diagonal . map matches
where matches y = [if x == y then Just x else Nothing | x <- xs]
This can be augmented with nub to filter out duplicates:
isectUniq :: Eq a => [a] -> [a] -> [a]
isectUniq xs = nub . isect xs
Of the line isect xs = catMaybes . diagonal . map matches
(map matches) ys
computes a list of lists of comparisons between elements of xs
and ys
, where the list indices specify the indices in ys
and xs
respectively: i.e (map matches) ys !! 3 !! 0
would represent the comparison of ys !! 3
with xs !! 0
, which would be Nothing
if those values differ. If those values are the same, it would be Just
that value.
diagonals
takes a list of lists and returns a list of lists where the nth output list contains an element each from the first n lists. Another way to conceptualise it is that (diagonals . map matches) ys !! n
contains comparisons between elements whose indices in xs
and ys
sum to n
.
diagonal
is simply a flat version of diagonals
(diagonal = concat diagonals
)
Therefore (diagonal . map matches) ys
is a list of comparisons between elements of xs
and ys
, where the elements are approximately sorted by the sum of the indices of the elements of ys
and xs
being compared; this means that early elements are compared to later elements with the same priority as middle elements being compared to each other.
(catMaybes . diagonal . map matches) ys
is a list of only the elements which are in both lists, where the elements are approximately sorted by the sum of the indices of the two elements being compared.
Note
(diagonal . map (catMaybes . matches)) ys
does not work: catMaybes . matches
only yields when it finds a match, instead of also yielding Nothing
on no match, so the interleaving does nothing to distribute the work.
To contrast, in the chosen solution, the interleaving of Nothing
and Just
values by diagonal
means that the program divides its attention between 'searching' for multiple different elements, not waiting for one to succeed; whereas if the Nothing
values are removed before interleaving, the program may spend too much time waiting for a fruitless 'search' for a given element to succeed.
Therefore, we would encounter the same problem as in the original question: while one element does not match any elements in the other list, the program will hang; whereas the chosen solution will only hang while no matches are found for any elements in either list.
Upvotes: 0
Reputation: 116139
Here's yet another alternative, leveraging Control.Monad.WeightedSearch
import Control.Monad (guard)
import Control.Applicative
import qualified Control.Monad.WeightedSearch as W
We first define a cost for digging inside the list. Accessing the tail costs 1 unit more. This will ensure a fair scheduling among the two infinite lists.
eachW :: [a] -> W.T Int a
eachW = foldr (\x w -> pure x <|> W.weight 1 w) empty
Then, we simply disregard infinite lists.
intersection :: [Int] -> [Int] -> [Int]
intersection xs ys = W.toList $ do
x <- eachW xs
y <- eachW ys
guard (x==y)
return y
Even better with MonadComprehensions
on:
intersection2 :: [Int] -> [Int] -> [Int]
intersection2 xs ys = W.toList [ y | x <- eachW xs, y <- eachW ys, x==y ]
Upvotes: 2
Reputation: 3790
If elements in the lists are ordered then you can easy to do that.
intersectOrd :: Ord a => [a] -> [a] -> [a]
intersectOrd [] _ = []
intersectOrd _ [] = []
intersectOrd (x:xs) (y:ys) = case x `compare` y of
EQ -> x : intersectOrd xs ys
LT -> intersectOrd xs (y:ys)
GT -> intersectOrd (x:xs) ys
Upvotes: 5
Reputation: 2818
Here's one way. For each x
we make a list of maybes which has
Just x
only where x
appeared in ys
. Then we interleave all
these lists.
isect :: Eq a => [a] -> [a] -> [a]
isect xs ys = (catMaybes . foldr interleave [] . map matches) xs
where
matches x = [if x == y then Just x else Nothing | y <- ys]
interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs
Maybe it can be improved using some sort of fairer interleaving - it's already pretty slow on the example below because (I think) it's doing an exponential amount of work.
> take 10 (isect [0..] [0,2..])
[0,2,4,6,8,10,12,14,16,18]
Upvotes: 5