Reputation: 2085
I'm trying to get from the before to after state. Is there a convenient Haskell function for removing duplicate tuples from a list? Or perhaps it is something a bit more complicated such as iterating through the entire list?
Before: the list of tuples, sorted by word, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
Upvotes: 1
Views: 2321
Reputation: 77951
Searching for Eq a => [a] -> [a]
on hoogle, returns nub
function:
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)
As in the documentation the more general case is nubBy
.
That said, this is an O(n^2)
algorithm and may not be very efficient. An alternative would be to use Data.Set.fromList
if the values are an instance of Ord
type-class, as in:
import qualified Data.Set as Set
nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList
though this will not maintain the order of the original list.
A simple set style solution which maintains the order of the original list can be:
import Data.Set (Set, member, insert, empty)
nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
where
loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
loop acc@(xs, obs) x
| x `member` obs = acc
| otherwise = (x:xs, x `insert` obs)
Upvotes: 6
Reputation: 48591
If you want to define a version of nub
for Ord
, I recommend using
nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
where
go x r obs
| x `member` obs = r obs
| otherwise = obs' `seq` x : r obs'
where obs' = x `insert` obs
To see what this is doing, you can get rid of the foldr
:
nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
where
nub'' [] obs = obs `seq` []
nub'' (y : ys) obs
| y `member` obs = nub'' ys obs
| otherwise = obs' `seq` y : nub'' ys obs'
where obs' = y `insert` obs
One key point about this implementation, as opposed to behzad.nouri's, is that it produces elements lazily, as they are consumed. This is generally much better for cache utilization and garbage collection, as well as using a constant factor less memory than the reversing algorithm.
Upvotes: 4