Tyler Joe
Tyler Joe

Reputation: 377

Haskell function to keep the repeating elements of a list

Here is the expected input/output:

repeated "Mississippi" == "ips"

repeated [1,2,3,4,2,5,6,7,1] == [1,2]

repeated " " == " "

And here is my code so far:

repeated :: String -> String
repeated "" = "" 
repeated x = group $ sort x 

I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.

Upvotes: 0

Views: 1075

Answers (3)

AntC
AntC

Reputation: 2806

Here's some other approaches, to evaluate @chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)

While it's true that sorting is O(n lg n), ...

It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:

a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...

group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.

while greatly complicating the implementation.

The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.

repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
  • repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.

    repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
    
    inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
    inserter2 x [] = [(x, False)]
    inserter2 x (xb@(x', _): xs)
        | x == x'   = (x', True): xs
        | otherwise = xb: inserter2 x xs
    
  • repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.

    repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
    
    inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
    inserter3 x [] = [(x, 1)]
    inserter3 x xss@(xc@(x', c): xs) = case x `compare` x' of
        { LT -> ((x, 1): xss)
        ; EQ -> ((x', c+1): xs)
        ; GT -> (xc: inserter3 x xs)
        }
    
  • repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.

    repeated4 xs = sort $ go4 [] [] xs
    
    go4 :: Ord a => [a] -> [a] -> [a] -> [a]
    go4 repeats _ [] = repeats
    go4 repeats onces (x: xs) = case findUpd x onces of
        { (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
        ; (False, oncesU) -> go4 repeats oncesU xs
        }
    
    findUpd :: Ord a => a -> [a] -> (Bool, [a])
    findUpd x [] = (False, [x])
    findUpd x (x': os) | x == x' = (True, os)      -- i.e. x' removed
                       | otherwise =
                         let (b, os') = findUpd x os in (b, x': os')
    

    (That last bit of list-fiddling in findUpd is very similar to span.)

Upvotes: 0

Davislor
Davislor

Reputation: 15144

Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:

repeated :: Ord a => [a] -> [a]

After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.

Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.

So, you might fill in the following skeleton:

module Repeated (repeated) where

import Data.List (group, sort)

repeated :: Ord a => [a] -> [a]
repeated = map _
         . filter (\x -> _)
         . group
         . sort 

I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)

When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.

Upvotes: 2

chi
chi

Reputation: 116139

Your code already does half of the job

> group $ sort "Mississippi"
["M","iiii","pp","ssss"]

You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:

atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _       = False

Using this:

> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]

Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:

> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"

Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.

Yet another option is to use a list comprehension

> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"

This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.

Upvotes: 4

Related Questions