Uli Köhler
Uli Köhler

Reputation: 13750

Taking from a list until encountering a duplicate

In Haskell, takeWhile allows one to take entries from a (potentially infinite) list until a certain condition does not hold.

However, this condition can't depend on the previous entries of the list.

How can I take entries from a (potentially infinite) list until I encounter the first duplicate as outlined in this example?

*Main> takeUntilDuplicate [1,2,3,4,5,1,2,3,4]
[1,2,3,4,5]

Upvotes: 3

Views: 1774

Answers (4)

dfeuer
dfeuer

Reputation: 48611

Assuming that you're dealing with Ord instances, you can do it like this. This is essentially the same idea as Luis Casillas's answer, but expressed using a fold instead of State. Each of our answers uses a different generally applicable technique. Luis includes an excellent explanation of his; the classic explanation of mine is in "A Tutorial on the Universality and Expressiveness of Fold" by Graham Hutton.

import Data.Set (member)
import qualified Data.Set as S

takeUntilDuplicate :: Ord a => [a] -> [a]
takeUntilDuplicate xs = foldr go (const []) xs S.empty
  where
    go x cont set
      | x `member` set = []
      | otherwise      = x : cont (S.insert x set)

If you are actually dealing with Int (or anything that can be converted to an Int really quickly), you can improve the performance substantially by using Data.IntSet or Data.HashSet instead of Data.Set.

Upvotes: 7

Luis Casillas
Luis Casillas

Reputation: 30237

One way to go about this is to update a piece of state as you traverse the list, similar to what you'd do in an imperative language. This requires working with the State monad, which might take some studying and playing around to get it if it's your first time, but trust me, this is well worth learning. Let's start with imports:

import Control.Monad.State
import Data.Set (Set)
import qualified Data.Set as Set

The state we're going to keep is the Set of elements seen up to that point in the list. So first, let's write a pair of simple State actions to manage the set of seen elements:

-- Add an element to the context Set
remember :: Ord a => a -> State (Set a) ()
remember a = modify (Set.insert a)

-- Test if the context set contains an element
haveSeen :: Ord a => a -> State (Set a) Bool
haveSeen a = do seen <- get
                return (a `Set.member` seen)

Now we're going to combine these two into an action that checks for duplication:

isDuplicate :: Ord a => a -> State (Set a) Bool
isDuplicate a = do seen <- haveSeen a
                   remember a
                   return seen

You've mentioned the takeWhile function. We're going to build our solution along similar lines. This is takeWhile's definition:

-- different name to avoid collision
takeWhile' :: (a -> Bool) -> [a] -> [a]
takeWhile' _ [] =  []
takeWhile' p (a:as)
    | p a =  a : takeWhile p as
    | otherwise =  []

We can modify this function to work with a predicate that has the Bool wrapped inside a monad:

takeWhileM :: Monad m => (a -> m Bool) -> [a] -> m [a]
takeWhileM _ [] = return []
takeWhileM p (a:as) = 
    do test <- p a
       if test
       then do rest <- takeWhileM p as
               return (a:rest)
       else return []

But the key difference here is that because the test in takeWhileM is monadic, we can use our stateful isDuplicate from above. So each time we test an element of the list with isDuplicate, we will also record that element in the Set that's being threaded through the computation. So now we can write takeUntilDuplicate like this:

takeUntilDuplicate :: Ord a => [a] -> [a]
takeUntilDuplicate as = evalState (takeUntilDuplicate' as) Set.empty
    where takeUntilDuplicate' :: Ord a => [a] -> State (Set a) [a]
          takeUntilDuplicate' = takeWhileM (fmap not . isDuplicate)

Example use (with an infinite list argument):

 >>> takeUntilDuplicate (cycle [1..5])
 [1,2,3,4,5]

And what's neat is that several of these pieces of code could be reused for similar problems.

Upvotes: 10

Uli K&#246;hler
Uli K&#246;hler

Reputation: 13750

You can use a modified version of this duplicate removal function:

takeUntilDuplicate :: Eq a => [a] -> [a]
takeUntilDuplicate = helper []
    where helper seen [] = seen
          helper seen (x:xs)
              | x `elem` seen = seen
              | otherwise = helper (seen ++ [x]) xs

Note that elem is quite inefficient for large lists. Assuming a (the data type inside the list) is an Ordtype, this function could be improved by using Data.Set for membership lookup:

import qualified Data.Set as Set

takeUntilDuplicate' :: (Eq a, Ord a) => [a] -> [a]
takeUntilDuplicate' = helper Set.empty []
    where helper seenSet seen [] = seen
          helper seenSet seen (x:xs)
              | x `Set.member` seenSet = seen
              | otherwise = helper (Set.insert x seenSet) (seen ++ [x]) xs

If you don't care about the order of the results returned, the efficiency of the function can be improved further by returning a Set:

import qualified Data.Set as Set import Data.Set (Set)

takeUntilDuplicateSet :: (Eq a, Ord a) => [a] -> Set a
takeUntilDuplicateSet = helper Set.empty
    where helper seenSet [] = seenSet
          helper seenSet (x:xs)
              | x `Set.member` seenSet = seenSet
              | otherwise = helper (Set.insert x seenSet) xs

Upvotes: 2

pigworker
pigworker

Reputation: 43393

Your point that takeWhile doesn't work because you have no contextual information for the individual elements suggests the following strategy: get it.

This answer of mine makes reference to the decorate-with-context operation, which I called picks (because it shows you all the way to pick one element on which to focus). It's the general decorate-with-its-context operation that we just ought to have for free for every containery thing. For lists, it is

picks :: [x] -> [(x, ([x], [x]))] -- [(x-here, ([x-before], [x-after]))]
picks [] = []
picks (x : xs) = (x, ([], xs)) : [(y, (x : ys, zs)) | (y, (ys, zs)) <- picks xs]

and it works perfectly well for infinite lists, while we're about it.

Now have a go with

takeUntilDuplicate :: Eq x => [x] -> [x]
takeUntilDuplicate = map fst . takeWhile (\ (x, (ys, _)) -> not (elem x ys)) . picks

(Curiously, I'm disturbed that the above one-liner is rejected for ambiguity of Eq if not given the above type signature. I'm tempted to ask a question about it, here. Oh, it's the monomorphism restriction. How annoying.)

Remark. It makes a lot of sense to (and I normally would) represent the "elements before" component that picks delivers using snoc-lists (lists which grow on the right), the better to preserve sharing and visual left-to-right-ness.

Upvotes: 5

Related Questions