sanic
sanic

Reputation: 2085

Haskell: Removing duplicates tuples from a list?

I'm trying to get from the before to after state. Is there a convenient Haskell function for removing duplicate tuples from a list? Or perhaps it is something a bit more complicated such as iterating through the entire list?

Before: the list of tuples, sorted by word, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]

Upvotes: 1

Views: 2321

Answers (2)

behzad.nouri
behzad.nouri

Reputation: 77951

Searching for Eq a => [a] -> [a] on hoogle, returns nub function:

The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)

As in the documentation the more general case is nubBy.

That said, this is an O(n^2) algorithm and may not be very efficient. An alternative would be to use Data.Set.fromList if the values are an instance of Ord type-class, as in:

import qualified Data.Set as Set

nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList

though this will not maintain the order of the original list.

A simple set style solution which maintains the order of the original list can be:

import Data.Set (Set, member, insert, empty)

nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
    where
    loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
    loop acc@(xs, obs) x
        | x `member` obs = acc
        | otherwise = (x:xs, x `insert` obs)

Upvotes: 6

dfeuer
dfeuer

Reputation: 48591

If you want to define a version of nub for Ord, I recommend using

nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
  where
    go x r obs
      | x `member` obs = r obs
      | otherwise = obs' `seq` x : r obs'
      where obs' = x `insert` obs

To see what this is doing, you can get rid of the foldr:

nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
  where
    nub'' [] obs = obs `seq` []
    nub'' (y : ys) obs
      | y `member` obs = nub'' ys obs
      | otherwise = obs' `seq` y : nub'' ys obs'
      where obs' = y `insert` obs

One key point about this implementation, as opposed to behzad.nouri's, is that it produces elements lazily, as they are consumed. This is generally much better for cache utilization and garbage collection, as well as using a constant factor less memory than the reversing algorithm.

Upvotes: 4

Related Questions