AJF
AJF

Reputation: 11913

Isn't this a double traversal?

In the "programming tips" section of the haskell wiki, I found this example:

count :: (a -> Bool) -> [a] -> Int
count p = length . filter p

This was said to be a better alternative to

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
   | p x       = 1 + count p xs
   | otherwise =     count p xs

Which, in terms of readability, I entirely agree with.

However, isn't that a double traversal, and therefore actually worse than the explicit-recursion function? Does laziness in GHC mean that this is equivalent to a single traverse after optimisation? Which implementation is faster, and why?

Upvotes: 3

Views: 149

Answers (3)

rampion
rampion

Reputation: 89053

So to see what the optimizer actually does, let's look at the core:

% ghc -O2 -ddump-simpl Temp.hs
[1 of 1] Compiling Temp             ( Temp.hs, Temp.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 29, types: 26, coercions: 0}

Temp.count
  :: forall a_arN.
     (a_arN -> GHC.Types.Bool) -> [a_arN] -> GHC.Types.Int
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=DmdType <L,C(U)><S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [60 0] 191 0}]
Temp.count =
  \ (@ a_aMA)
    (p_arV :: a_aMA -> GHC.Types.Bool)
    (eta_B1 :: [a_aMA]) ->
    letrec {
      go_aNr [Occ=LoopBreaker]
        :: [a_aMA] -> GHC.Prim.Int# -> GHC.Types.Int
      [LclId, Arity=1, Str=DmdType <S,1*U>]
      go_aNr =
        \ (ds_aNs :: [a_aMA]) ->
          case ds_aNs of _ [Occ=Dead] {
            [] -> GHC.Types.I#;
            : y_aNx ys_aNy ->
              case p_arV y_aNx of _ [Occ=Dead] {
                GHC.Types.False -> go_aNr ys_aNy;
                GHC.Types.True ->
                  let {
                    g_a10B [Dmd=<L,C(U)>] :: GHC.Prim.Int# -> GHC.Types.Int
                    [LclId, Str=DmdType]
                    g_a10B = go_aNr ys_aNy } in
                  \ (x_a10C :: GHC.Prim.Int#) -> g_a10B (GHC.Prim.+# x_a10C 1)
              }
          }; } in
    go_aNr eta_B1 0

Cleaning it up a bit:

Temp.count :: forall aType.  (aType -> Bool) -> [aType] -> Int
Temp.count = \(@ aType) (p :: aType -> Bool) (as :: [aType]) ->
  letrec {
    go :: [aType] -> GHC.Prim.Int# -> Int
    go = \(xs :: [aType]) ->
      case xs of _ {
        [] -> I#; -- constructor to make a GHC.Prim.Int# into an Int
        : y ys ->
          case p y of _ {
            False -> go ys;
            True ->
              let {
                go' :: GHC.Prim.Int# -> Int
                go' = go ys 
              } in \(x :: GHC.Prim.Int#) -> go' (GHC.Prim.+# x 1)
          }
      }; 
  } in go as 0

Since we're operating on the unboxed type GHC.Prim.Int#, all the additions are strict, so we only have one loop through the data.

Upvotes: 11

Abhay
Abhay

Reputation: 768

Either way, you have to perform one or two operations per item. The necessary one is the checking of the predicate. The second one of adding 1 depends on the result of the predicate.

So, if you don't consider the effects of cache etc., both cases generate equal number of operations.

The picture that one gets is that there are two separate traversals in the first case, one gathering the elements, and one counting them. Given a list bigger than the cache can handle, this will slow down the processing. And this actually happens in a strict language.

However Haskell's laziness comes into picture here. The list generated by filter is evaluated (comes into existence) element-by-element, as the counting function length needs it. And then as length uses them just for counting and does not preserve references to the newly created list, the elements are immediately free to be garbage-collected. So at any time, there is only O(1) memory occupied during the calculation.

There is a slight overhead of constructing the resultant "filtered" list in the first version. But this is usually negligible compared with cache effects which come in when the lists are large. (For small lists it may matter; this needs to be tested.) Further, it may be optimized away depending upon the compiler and the optimization level chosen.

Update. The second version will actually consume memory as is pointed out in one of the comments. For a fair comparison ,you need to write it with accumulating parameter and strictness annotator at the right place, because (I expect) length is already written this way.

Upvotes: 3

hasufell
hasufell

Reputation: 573

You can test which version is faster with profiling, e.g.:

module Main where


main :: IO ()
main = print countme'

count' :: (a -> Bool) -> [a] -> Int
count' p = length . filter p

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
   | p x       = 1 + count p xs
   | otherwise =     count p xs


list = [0..93823249]

countme' = count' (\x -> x `mod` 15 == 0) list
countme = count (\x -> x `mod` 15 == 0) list

Then run ghc -O2 -prof -fprof-auto -rtsopts Main.hs and ./Main +RTS -p. It will yield the file Main.prof. Then change the main function to use countme instead and compare the results. Mine are:

  • 4.12s for the implicit version
  • 6.34s for the explicit version

If you turn off optimization, then the implicit one is still slightly (but not much) faster.

Apart from the fusion and laziness which was already explained by others, I guess another reason could be that length and filter are Prelude functions and can be better optimized by the compiler.

Upvotes: 2

Related Questions