Reputation: 13750
In Haskell, takeWhile
allows one to take entries from a (potentially infinite) list until a certain condition does not hold.
However, this condition can't depend on the previous entries of the list.
How can I take
entries from a (potentially infinite) list until I encounter the first duplicate as outlined in this example?
*Main> takeUntilDuplicate [1,2,3,4,5,1,2,3,4]
[1,2,3,4,5]
Upvotes: 3
Views: 1774
Reputation: 48611
Assuming that you're dealing with Ord
instances, you can do it like this. This is essentially the same idea as Luis Casillas's answer, but expressed using a fold instead of State
. Each of our answers uses a different generally applicable technique. Luis includes an excellent explanation of his; the classic explanation of mine is in "A Tutorial on the Universality and Expressiveness of Fold" by Graham Hutton.
import Data.Set (member)
import qualified Data.Set as S
takeUntilDuplicate :: Ord a => [a] -> [a]
takeUntilDuplicate xs = foldr go (const []) xs S.empty
where
go x cont set
| x `member` set = []
| otherwise = x : cont (S.insert x set)
If you are actually dealing with Int
(or anything that can be converted to an Int
really quickly), you can improve the performance substantially by using Data.IntSet
or Data.HashSet
instead of Data.Set
.
Upvotes: 7
Reputation: 30237
One way to go about this is to update a piece of state as you traverse the list, similar to what you'd do in an imperative language. This requires working with the State
monad, which might take some studying and playing around to get it if it's your first time, but trust me, this is well worth learning. Let's start with imports:
import Control.Monad.State
import Data.Set (Set)
import qualified Data.Set as Set
The state we're going to keep is the Set
of elements seen up to that point in the list. So first, let's write a pair of simple State
actions to manage the set of seen elements:
-- Add an element to the context Set
remember :: Ord a => a -> State (Set a) ()
remember a = modify (Set.insert a)
-- Test if the context set contains an element
haveSeen :: Ord a => a -> State (Set a) Bool
haveSeen a = do seen <- get
return (a `Set.member` seen)
Now we're going to combine these two into an action that checks for duplication:
isDuplicate :: Ord a => a -> State (Set a) Bool
isDuplicate a = do seen <- haveSeen a
remember a
return seen
You've mentioned the takeWhile
function. We're going to build our solution along similar lines. This is takeWhile
's definition:
-- different name to avoid collision
takeWhile' :: (a -> Bool) -> [a] -> [a]
takeWhile' _ [] = []
takeWhile' p (a:as)
| p a = a : takeWhile p as
| otherwise = []
We can modify this function to work with a predicate that has the Bool
wrapped inside a monad:
takeWhileM :: Monad m => (a -> m Bool) -> [a] -> m [a]
takeWhileM _ [] = return []
takeWhileM p (a:as) =
do test <- p a
if test
then do rest <- takeWhileM p as
return (a:rest)
else return []
But the key difference here is that because the test in takeWhileM
is monadic, we can use our stateful isDuplicate
from above. So each time we test an element of the list with isDuplicate
, we will also record that element in the Set
that's being threaded through the computation. So now we can write takeUntilDuplicate
like this:
takeUntilDuplicate :: Ord a => [a] -> [a]
takeUntilDuplicate as = evalState (takeUntilDuplicate' as) Set.empty
where takeUntilDuplicate' :: Ord a => [a] -> State (Set a) [a]
takeUntilDuplicate' = takeWhileM (fmap not . isDuplicate)
Example use (with an infinite list argument):
>>> takeUntilDuplicate (cycle [1..5])
[1,2,3,4,5]
And what's neat is that several of these pieces of code could be reused for similar problems.
Upvotes: 10
Reputation: 13750
You can use a modified version of this duplicate removal function:
takeUntilDuplicate :: Eq a => [a] -> [a]
takeUntilDuplicate = helper []
where helper seen [] = seen
helper seen (x:xs)
| x `elem` seen = seen
| otherwise = helper (seen ++ [x]) xs
Note that elem
is quite inefficient for large lists. Assuming a
(the data type inside the list) is an Ord
type, this function could be improved by using Data.Set
for membership lookup:
import qualified Data.Set as Set
takeUntilDuplicate' :: (Eq a, Ord a) => [a] -> [a]
takeUntilDuplicate' = helper Set.empty []
where helper seenSet seen [] = seen
helper seenSet seen (x:xs)
| x `Set.member` seenSet = seen
| otherwise = helper (Set.insert x seenSet) (seen ++ [x]) xs
If you don't care about the order of the results returned, the efficiency of the function can be improved further by returning a Set
:
import qualified Data.Set as Set import Data.Set (Set)
takeUntilDuplicateSet :: (Eq a, Ord a) => [a] -> Set a
takeUntilDuplicateSet = helper Set.empty
where helper seenSet [] = seenSet
helper seenSet (x:xs)
| x `Set.member` seenSet = seenSet
| otherwise = helper (Set.insert x seenSet) xs
Upvotes: 2
Reputation: 43393
Your point that takeWhile
doesn't work because you have no contextual information for the individual elements suggests the following strategy: get it.
This answer of mine makes reference to the decorate-with-context operation, which I called picks
(because it shows you all the way to pick one element on which to focus). It's the general decorate-with-its-context operation that we just ought to have for free for every containery thing. For lists, it is
picks :: [x] -> [(x, ([x], [x]))] -- [(x-here, ([x-before], [x-after]))]
picks [] = []
picks (x : xs) = (x, ([], xs)) : [(y, (x : ys, zs)) | (y, (ys, zs)) <- picks xs]
and it works perfectly well for infinite lists, while we're about it.
Now have a go with
takeUntilDuplicate :: Eq x => [x] -> [x]
takeUntilDuplicate = map fst . takeWhile (\ (x, (ys, _)) -> not (elem x ys)) . picks
(Curiously, I'm disturbed that the above one-liner is rejected for ambiguity of Eq
if not given the above type signature. I'm tempted to ask a question about it, here. Oh, it's the monomorphism restriction. How annoying.)
Remark. It makes a lot of sense to (and I normally would) represent the "elements before" component that picks
delivers using snoc-lists (lists which grow on the right), the better to preserve sharing and visual left-to-right-ness.
Upvotes: 5