EarthenSky
EarthenSky

Reputation: 196

Clean syntax for conditionally folding a list in Haskell

I'm relatively new to haskell, but in my searching I couldn't find an easy way to conditionally fold a list. i.e. When an element satisfies a condition (like in filter) to fold that element by a function (like foldr and foldl).

My workaround was to write the following helper function, then apply map to change the resulting list of pairs as my situation required.

-- This function returns tuples containing the elements which 
--    satisfy `cond` folded right, adding 1 to the second value
-- in each pair. (`snd` pair starts at 0)
-- Condition takes a single value (similar to `filter`)
-- NOTE: list cannot end with token
foldrOn cond list = 
    if (length list) > 0 then 
        if cond (head list) then 
            do
                let tmp = foldrOn cond (tail list)
                (fst (head tmp), snd (head tmp) + 1) : (tail tmp) 
                      -- fold token into char after it 
        else
            (head list, 0) : (foldrOn cond (tail list)) 
                      -- don't fold token
    else
        [] -- base case len list = 0

foldlOn cond list = ...

For example, the use-case would be something along the lines of wanting to remove the zeros in the following lists but remember how many were removed between each value.

-- the second value in each resultant pair represents the number of 
-- zeroes preceding the corresponding first value in the original list.
foldrOn (== 0) [1,0,0,0,0,0,1,0,0,0,1] -- [(1,0),(1,5),(1,3)]
foldrOn (== 0) [1,0,0,12,0,13] -- [(1,0),(12,2),(13,1)]

Is there a better way to accomplish this? Additionally, can this be done more optimally?

Upvotes: 3

Views: 999

Answers (3)

dfeuer
dfeuer

Reputation: 48591

Let's start by using pattern matching to make your own implementation more idiomatic, more obviously correct, and also (much) faster. We can also use guards in an idiomatic fashion rather than if/then/else; this is rather less important. There's also no reason to use do here, so we won't.

foldrOn _cond [] = []
foldrOn cond (hd : tl)
  | cond hd
  = case foldrOn cond tl of
      (x, y) : tl' -> (x, y + 1) : tl' 
                      -- fold token into char after it
      [] -> error "String ended on token."
  | otherwise
  = (hd, 0) : foldrOn cond tl
                      -- don't fold token

This is ... okay. But as Will Ness suggests, we don't actually gain anything by consing an "incomplete" element onto the result list. We can instead count up the cond-satisfying tokens until we reach the end of the block, and then produce a complete element. I think this makes the code a little easier to understand, and it should also run a little bit faster.

foldrOn cond = go 0
  where
    go count (hd : tl)
      | cond hd
      = go (count + 1) tl -- Don't produce anything; just bump the count
      | otherwise
      = (hd, count) : go 0 tl -- Produce the element and the count; reset the count to 0
    go count []
      | count == 0
      = []
      | otherwise
      = error "List ended on a token."

To actually run faster, you might need to tell the compiler explicitly that you really want to calculate the counts. You probably don't need to understand this part just yet, but it looks like this:

-- At the top of the file, add this line:
{-# LANGUAGE BangPatterns #-}

foldrOn cond = go 0
  where
    go !count (hd : tl)
      | cond hd
      = go (count + 1) tl -- Don't produce anything; just bump the count
      | otherwise
      = (hd, count) : go 0 tl -- Produce the element and the count; reset the count to 0
    go count []
      | count == 0
      = []
      | otherwise
      = error "List ended on a token."

This can be written as a fold in the manner Will Ness demonstrates.

Note: while it's possible to avoid the BangPatterns language extension, doing so is a bit annoying.

Upvotes: 2

Will Ness
Will Ness

Reputation: 71075

First of all,

foldrOn :: Num t => (a -> Bool) -> [a] -> [(a, t)]
-- foldrOn (== 0) [1,0,0,0,0,0,1,0,0,0,1] -- [(1,0),(1,5),(1,3)]
foldrOn p xs  =  foldr g [] xs
  where
  g x []        = [(x,0)]
  g x ((y,n):r)
    | p x       = ((y,n+1):r)
  g x r         = ((x,0):r)

This is the simplest, though it is recursive, i.e. will force the whole list to the end before starting returning its result.

To make it maximally lazy we'd have to use a lazy left fold. The skipping over the p-satisfying elements is still a recursive step, but at least the process will pause between each such span.

Lazy left fold is usually implemented as a foldr with additional argument being passed left to right along the list:

foldlOn :: Num t => (a -> Bool) -> [a] -> [(a, t)]
-- foldlOn (== 0) [1,0,0,0,0,0,1,0,0,0,1] -- [(1,0),(1,5),(1,3)]
foldlOn p xs  =  foldr g z xs 0
  where
  g x r i | p x       =         r (i+1)
          | otherwise = (x,i) : r 0
  z    _i             =         []

Or you could combine span/break and unfoldr to do the same.

You might find a way to use groupBy with some post-processing step:

GHCi> groupBy (\a b -> (==0) b) [1,0,0,0,0,0,1,0,0,0,1]
[[1,0,0,0,0,0],[1,0,0,0],[1]]

GHCi> groupBy (const (==0)) [1,2,0,0,1,0,1]
[[1],[2,0,0],[1,0],[1]]

Finishing this should not be a problem.

Upvotes: 3

radrow
radrow

Reputation: 7139

You can always bring some builtin machinery. The Data.List library is quite powerful:

import Data.List(mapAccumL)
import Data.Maybe(catMaybes)

foldrOn cond = catMaybes . snd . mapAccumL combine 0 where
  combine a el =
    if cond el then (a + 1, Nothing)
    else (0, Just (el, a))

What's going on

Essentially, foldrOn cond is a composition of the following functions:

  • mapAccumL combine 0 which advances along the list modifying each element by information about the number of recently skipped entities (starting the count at 0 and resetting it whenever we find something that doesn't match the cond predicate).
  • snd which discards the final state from the mapAccumL's result
  • catMaybes which removes the Maybe layer and leaves only the "present" values.

Upvotes: 3

Related Questions