Anonymous Panther
Anonymous Panther

Reputation: 59

Split a list into non-empty sub-lists in Haskell

I have to split the given list into non-empty sub-lists each of which is either in strictly ascending order, in strictly descending order, or contains all equal elements. For example, [5,6,7,2,1,1,1] should become [[5,6,7],[2,1],[1,1]].

Here is what I have done so far:

splitSort :: Ord a => [a] -> [[a]] 
splitSort ns = foldr k [] ns
  where
    k a []  = [[a]]
    k a ns'@(y:ys) | a <= head y = (a:y):ys
                   | otherwise = [a]:ns'

I think I am quite close but when I use it it outputs [[5,6,7],[2],[1,1,1]] instead of [[5,6,7],[2,1],[1,1]].

Upvotes: 4

Views: 2406

Answers (6)

Redu
Redu

Reputation: 26191

The initial try turned out to be lengthy probably inefficient but i will keep it striked for the sake of integrity with the comments. You best just skip to the end for the answer.

Nice question... but turns out to be a little hard candy. My approach is in segments, those of each i will explain;

import Data.List (groupBy)

splitSort :: Ord a => [a] -> [[a]]
splitSort (x:xs) = (:) <$> (x :) . head <*> tail $ interim
                   where
                   pattern = zipWith compare <$> init <*> tail
                   tuples  = zipWith (,) <$> tail <*> pattern
                   groups  = groupBy (\p c -> snd p == snd c) . tuples $ (x:xs)
                   interim = groups >>= return . map fst

*Main> splitSort [5,6,7,2,1,1,1]
[[5,6,7],[2,1],[1,1]]
  • The pattern function (zipWith compare <$> init <*> tail) is of type Ord a => [a] -> [Ordering] when fed with [5,6,7,2,1,1,1] compares the init of it by the tail of it by zipWith. So the result would be [LT,LT,GT,GT,EQ,EQ]. This is the pattern we need.
  • The tuples function will take the tail of our list and will tuple up it's elements with the corresponding elements from the result of pattern. So we will end up with something like [(6,LT),(7,LT),(2,GT),(1,GT),(1,EQ),(1,EQ)].
  • The groups function utilizes Data.List.groupBy over the second items of the tuples and generates the required sublists such as [[(6,LT),(7,LT)],[(2,GT),(1,GT)],[(1,EQ),(1,EQ)]]
  • Interim is where we monadically get rid of the Ordering type values and tuples. The result of interim is [[6,7],[2,1],[1,1]].
  • Finally at the main function body (:) <$> (x :) . head <*> tail $ interim appends the first item of our list (x) to the sublist at head (it has to be there whatever the case) and gloriously present the solution.

Edit: So investigating the [0,1,0,1] resulting [[0,1],[0],[1]] problem that @Jonas Duregård discovered, we can conclude that in the result there shall be no sub lists with a length of 1 except for the last one when singled out. I mean for an input like [0,1,0,1,0,1,0] the above code produces [[0,1],[0],[1],[0],[1],[0]] while it should [[0,1],[0,1],[0,1],[0]]. So I believe adding a squeeze function at the very last stage should correct the logic.

import Data.List (groupBy)

splitSort :: Ord a => [a] -> [[a]]
splitSort []     = []
splitSort [x]    = [[x]]
splitSort (x:xs) = squeeze $ (:) <$> (x :) . head <*> tail $ interim
                   where
                   pattern = zipWith compare <$> init <*> tail
                   tuples  = zipWith (,) <$> tail <*> pattern
                   groups  = groupBy (\p c -> snd p == snd c) $ tuples (x:xs)
                   interim = groups >>= return . map fst

                   squeeze []           = []
                   squeeze [y]          = [y]
                   squeeze ([n]:[m]:ys) = [n,m] : squeeze ys
                   squeeze ([n]:(m1:m2:ms):ys) | compare n m1 == compare m1 m2 = (n:m1:m2:ms) : squeeze ys
                                               | otherwise                     = [n] : (m1:m2:ms) : squeeze ys
                   squeeze (y:ys)       = y : squeeze s

*Main> splitSort [0,1, 0, 1, 0, 1, 0]
[[0,1],[0,1],[0,1],[0]]
*Main> splitSort [5,6,7,2,1,1,1]
[[5,6,7],[2,1],[1,1]]
*Main> splitSort [0,0,1,0,-1]
[[0,0],[1,0,-1]]

Yes; as you will also agree the code has turned out to be a little too lengthy and possibly not so efficient.

The Answer: I have to trust the back of my head when it keeps telling me i am not on the right track. Sometimes, like in this case, the problem reduces down to a single if then else instruction, much simpler than i had initially anticipated.

runner :: Ord a => Maybe Ordering -> [a] -> [[a]]
runner _       []  = []
runner _       [p] = [[p]]
runner mo (p:q:rs) = let mo'    = Just (compare p q)
                         (s:ss) = runner mo' (q:rs)
                     in if mo == mo' || mo == Nothing then (p:s):ss
                                                      else [p] : runner Nothing (q:rs)
splitSort :: Ord a => [a] -> [[a]]
splitSort = runner Nothing

My test cases

*Main> splitSort [0,1, 0, 1, 0, 1, 0]
[[0,1],[0,1],[0,1],[0]]
*Main> splitSort [5,6,7,2,1,1,1]
[[5,6,7],[2,1],[1,1]]
*Main> splitSort [0,0,1,0,-1]
[[0,0],[1,0,-1]]
*Main> splitSort [1,2,3,5,2,0,0,0,-1,-1,0]
[[1,2,3,5],[2,0],[0,0],[-1,-1],[0]]

Upvotes: 1

Yann Vernier
Yann Vernier

Reputation: 15887

My initial thought looks like:

ordruns :: Ord a => [a] -> [[a]]
ordruns = foldr extend []
  where
    extend a [                    ] = [      [a]      ]
    extend a (    [b]       : runs) =       [a,b]   : runs
    extend a (run@(b:c:etc) : runs)
      | compare a b == compare b c  =       (a:run) : runs
      | otherwise                   = [a] : run     : runs

This eagerly fills from the right, while maintaining the Ordering in all neighbouring pairs for each sublist. Thus only the first result can end up with a single item in it.

The thought process is this: an Ordering describes the three types of subsequence we're looking for: ascending LT, equal EQ or descending GT. Keeping it the same every time we add on another item means it will match throughout the subsequence. So we know we need to start a new run whenever the Ordering does not match. Furthermore, it's impossible to compare 0 or 1 items, so every run we create contains at least 1 and if there's only 1 we do add the new item.

We could add more rules, such as a preference for filling left or right. A reasonable optimization is to store the ordering for a sequence instead of comparing the leading two items twice per item. And we could also use more expressive types. I also think this version is inefficient (and inapplicable to infinite lists) due to the way it collects from the right; that was mostly so I could use cons (:) to build the lists.

Second thought: I could collect the lists from the left using plain recursion.

ordruns :: Ord a => [a] -> [[a]]
ordruns [] = []
ordruns [a] = [[a]]
ordruns (a1:a2:as) = run:runs
  where
    runs = ordruns rest
    order = compare a1 a2
    run = a1:a2:runcontinuation
    (runcontinuation, rest) = collectrun a2 order as
    collectrun _ _ [] = ([], [])
    collectrun last order (a:as)
      | order == compare last a =
          let (more,rest) = collectrun a order as
          in (a:more, rest)
      | otherwise = ([], a:as)

More exercises. What if we build the list of comparisons just once, for use in grouping?

import Data.List

ordruns3 [] = []
ordruns3 [a] = [[a]]
ordruns3 xs = unfoldr collectrun marked
  where
    pairOrder = zipWith compare xs (tail xs)
    marked = zip (head pairOrder : pairOrder) xs
    collectrun [] = Nothing
    collectrun ((o,x):xs) = Just (x:map snd markedgroup, rest)
      where (markedgroup, rest) = span ((o==).fst) xs

And then there's the part where there's a groupBy :: (a -> a -> Bool) -> [a] -> [[a]] but no groupOn :: Eq b => (a -> b) -> [a] -> [[a]]. We can use a wrapper type to handle that.

import Data.List

data Grouped t = Grouped Ordering t
instance Eq (Grouped t) where
  (Grouped o1 _) == (Grouped o2 _) = o1 == o2
ordruns4 [] = []
ordruns4 [a] = [[a]]
ordruns4 xs = unmarked
  where
    pairOrder = zipWith compare xs (tail xs)
    marked = group $ zipWith Grouped (head pairOrder : pairOrder) xs
    unmarked = map (map (\(Grouped _ t) -> t)) marked

Of course, the wrapper type's test can be converted into a function to use groupBy instead:

import Data.List

ordruns5 [] = []
ordruns5 [a] = [[a]]
ordruns5 xs = map (map snd) marked
  where
    pairOrder = zipWith compare xs (tail xs)
    marked = groupBy (\a b -> fst a == fst b) $
               zip (head pairOrder : pairOrder) xs

These marking versions arrive at the same decoration concept Jonas Duregård applied.

Upvotes: 0

Will Ness
Will Ness

Reputation: 71119

Every ordered prefix is already in some order, and you don't care in which, as long as it is the longest:

import Data.List (group, unfoldr)

foo :: Ord t => [t] -> [[t]]
foo = unfoldr f
  where
  f []  = Nothing
  f [x] = Just ([x], [])
  f xs  = Just $ splitAt (length g + 1) xs
            where 
            (g : _) = group $ zipWith compare xs (tail xs)

length can be fused in to make the splitAt count in unary essentially, and thus not be as strict (unnecessarily, as Jonas Duregård rightly commented):

  ....
  f xs  = Just $ foldr c z g xs
            where 
            (g : _) = group $ zipWith compare xs (tail xs)
            c _ r (x:xs) = let { (a,b) = r xs } in (x:a, b)
            z     (x:xs) = ([x], xs)

Upvotes: 1

assembly.jc
assembly.jc

Reputation: 2076

I wonder whether this question can be solve using foldr if splits and groups a list from

 [5,6,7,2,1,1,1]

to

 [[5,6,7],[2,1],[1,1]]

instead of

 [[5,6,7],[2],[1,1,1]]

The problem is in each step of foldr, we only know the sorted sub-list on right-hand side and a number to be processed. e.g. after read [1,1] of [5,6,7,2,1,1,1] and next step, we have

1, [[1, 1]] 

There are no enough information to determine whether make a new group of 1 or group 1 to [[1,1]]

And therefore, we may construct required sorted sub-lists by reading elements of list from left to right, and why foldl to be used. Here is a solution without optimization of speed.

EDIT: As the problems that @Jonas Duregård pointed out on comment, some redundant code has been removed, and beware that it is not a efficient solution.

splitSort::Ord a=>[a]->[[a]]
splitSort numList = foldl step [] numList
    where step [] n       = [[n]]
          step sublists n = groupSublist (init sublists) (last sublists) n

          groupSublist sublists [n1] n2 = sublists ++ [[n1, n2]]
          groupSublist sublists sortedList@(n1:n2:ns) n3
            | isEqual n1 n2 = groupIf (isEqual n2 n3) sortedList n3 
            | isAscen n1 n2 = groupIfNull isAscen sortedList n3
            | isDesce n1 n2 = groupIfNull isDesce sortedList n3
            | otherwise     = mkNewGroup sortedList n3
            where groupIfNull check sublist@(n1:n2:ns) n3
                    | null ns   = groupIf (check n2 n3) [n1, n2] n3
                    | otherwise = groupIf (check (last ns) n3) sublist n3
                  groupIf isGroup | isGroup   = addToGroup
                                  | otherwise = mkNewGroup
                  addToGroup gp n = sublists ++ [(gp ++ [n])]
                  mkNewGroup gp n = sublists ++ [gp] ++ [[n]]

          isEqual x y = x == y
          isAscen x y = x < y
          isDesce x y = x > y

Upvotes: 0

lsmor
lsmor

Reputation: 5063

For this solution I am making the assumption that you want the "longest rally". By that I mean:

splitSort [0, 1, 0, 1] = [[0,1], [0,1]]    -- This is OK
splitSort [0, 1, 0, 1] = [[0,1], [0], [1]] -- This is not OK despite of fitting your requirements 

Essentially, There are two pieces:

  • Firstly, split the list in two parts: (a, b). Part a is the longest rally considering the order of the two first elements. Part b is the rest of the list.
  • Secondly, apply splitSort on b and put all list into one list of list

Taking the longest rally is surprisingly messy but straight. Given the list x:y:xs: by construction x and y will belong to the rally. The elements in xs belonging to the rally depends on whether or not they follow the Ordering of x and y. To check this point, you zip every element with the Ordering is has compared against its previous element and split the list when the Ordering changes. (edge cases are pattern matched) In code:

import Data.List
import Data.Function

-- This function split the list in two (Longest Rally, Rest of the list)
splitSort' :: Ord a => [a] -> ([a], [a])
splitSort'        []  = ([], [])
splitSort'     (x:[]) = ([x],[])
splitSort' l@(x:y:xs) = case span ( (o ==) . snd) $ zip (y:xs) relativeOrder of
                            (f, s) -> (x:map fst f, map fst s)
  where relativeOrder = zipWith compare (y:xs) l
        o = compare y x

-- This applies the previous recursively
splitSort :: Ord a => [a] -> [[a]]
splitSort        []  = []
splitSort     (x:[]) = [[x]]
splitSort   (x:y:[]) = [[x,y]]
splitSort l@(x:y:xs) = fst sl:splitSort (snd sl)
  where sl = splitSort' l

Upvotes: 0

Jonas Dureg&#229;rd
Jonas Dureg&#229;rd

Reputation: 937

Here is a kinda ugly solution, with three reverse in one line of code :).

addElement :: Ord a => a -> [[a]] -> [[a]]
addElement a []  = [[a]]
addElement a (x:xss) = case x of
  (x1:x2:xs) 
    | any (check a x1 x2) [(==),(<),(>)] -> (a:x1:x2:xs):xss
    | otherwise -> [a]:(x:xss)
  _  -> (a:x):xss
  where 
    check x1 x2 x3 op = (x1 `op` x2) && (x2 `op` x3) 

splitSort xs = reverse $ map reverse $ foldr addElement [] (reverse xs)

You can possibly get rid of all the reversing if you modify addElement a bit.

EDIT: Here is a less reversing version (even works for infinite lists):

splitSort2 []         = []
splitSort2 [x]        = [[x]]
splitSort2 (x:y:xys)  = (x:y:map snd here):splitSort2 (map snd later)
  where 
    (here,later) = span ((==c) . uncurry compare) (zip (y:xys) xys)
    c            = compare x y  

EDIT 2: Finally, here is a solution based on a single decorating/undecorating, that avoids comparing any two values more than once and is probably a lot more efficient.

splitSort xs = go (decorate xs) where
  decorate :: Ord a => [a] -> [(Ordering,a)]
  decorate xs = zipWith (\x y -> (compare x y,y)) (undefined:xs) xs

  go :: [(Ordering,a)] -> [[a]]
  go ((_,x):(c,y):xys)  = let (here, later) = span ((==c) . fst) xys in 
                              (x : y : map snd here) : go later
  go xs = map (return . snd) xs -- Deal with both base cases

Upvotes: 1

Related Questions