Reputation: 787
I have been experimenting with the following Haskell code:
data Foo = Foo
{ fooMin :: Float
, fooMax :: Float
, fooSum :: Float
} deriving Show
getLocalFoo :: [Float] -> Foo
getLocalFoo x = Foo a b c
where
a = minimum x
b = maximum x
c = sum x
getGlobalFoo :: [Foo] -> Foo
getGlobalFoo x = Foo a b c
where
a = minimum $ fmap fooMin x
b = maximum $ fmap fooMax x
c = sum $ fmap fooSum x
main :: IO()
main = do
let numItems = 2000
let numLists = 100000
putStrLn $ "numItems: " ++ show numItems
putStrLn $ "numLists: " ++ show numLists
-- Create an infinite list of lists of floats, x is [[Float]]
let x = take numLists $ repeat [1.0 .. numItems]
-- Print two first elements of each item
print $ take 2 (map (take 2) x)
-- First calculate local min/max/sum for each float list
-- then calculate the global min/max/sum based on the results.
print . getGlobalFoo $ fmap getLocalFoo x
And sequentially tested runtime when adjusting numItems and numLists:
Low size:
numItems: 4.0
numLists: 2
[[1.0,2.0],[1.0,2.0]]
Foo {fooMin = 1.0, fooMax = 4.0, fooSum = 20.0}
real 0m0.005s
user 0m0.004s
sys 0m0.001s
High size:
numItems: 2000.0
numLists: 100000
[[1.0,2.0],[1.0,2.0]]
Foo {fooMin = 1.0, fooMax = 2000.0, fooSum = 1.9999036e11}
real 0m33.116s
user 0m33.005s
sys 0m0.109s
I have written this code in a in my opinion intuitive and naive way without consideration to performance, however I am concerned that this is far from optimal code as I may actually be folding through the lists way more times then necessary?
Could anyone suggest a better implementation of this test?
Upvotes: 0
Views: 661
Reputation: 35089
Use the foldl
library to run multiple folds efficiently in a single pass. In fact, it's so good at this that you don't need to split your list into sublists. You can just concatenate all the lists together into one giant list and fold that directly.
Here's how:
import Control.Applicative
import qualified Control.Foldl as L
data Foo = Foo
{ fooMin :: Maybe Float
, fooMax :: Maybe Float
, fooSum :: Float
} deriving Show
foldFloats :: L.Fold Float Foo
foldFloats = Foo <$> L.minimum <*> L.maximum <*> L.sum
-- or: foldFloats = liftA3 Foo L.minimum L.maximum L.sum
main :: IO()
main = do
let numItems = 2000
let numLists = 100000
putStrLn $ "numItems: " ++ show numItems
putStrLn $ "numLists: " ++ show numLists
-- Create an infinite list of lists of floats, x is [[Float]]
let x = replicate numLists [1.0 .. numItems]
-- Print two first elements of each item
print $ take 2 (map (take 2) x)
print $ L.fold foldFloats (concat x)
The main differences from your code are:
I use replicate n
, which is the same thing as take n . repeat
. In fact, that's how replicate
is actually defined
I don't bother processing the sublists individually. I just concat
them all together and fold that in a single pass.
I use Maybe
for the minimum and maximum since I need to handle the case of an empty list.
This code is faster
Here are the numbers:
$ time ./fold
numItems: 2000.0
numLists: 100000
[[1.0,2.0],[1.0,2.0]]
Foo {fooMin = Just 1.0, fooMax = Just 2000.0, fooSum = 3.435974e10}
real 0m5.796s
user 0m5.756s
sys 0m0.024s
foldl
is a really small and easy to learn library. You can learn more about it here.
Upvotes: 2
Reputation: 116139
Maybe a single fold is cheaper. Try running some tests with something like:
{-# LANGUAGE BangPatterns #-}
import Data.List
getLocalFoo :: [Float] -> Foo
getLocalFoo [] = error "getLocalFoo: empty list"
getLocalFoo (x:xs) = foldl' f (Foo x x x) xs
where f (Foo !min1 !max1 !sum1) y =
Foo (min1 `min` y) (max1 `max` y) (sum1 + y)
and its analogous for getGlobalFoo
.
Upvotes: 1
Reputation: 63379
Monoids to the rescue. All your operations - the sum, minimum and maximum - can be all expressed as monoids. For the minimum and maximum we need to wrap it into Option
from the semigroups, because we need to represent somehow the minimum and maximum of an empty collection. (An alternative way would be to restrict ourself to non-empty collections, then we could use semigroups instead of monoids.)
Another thing we'll need is to ensure that all computations are forced during each step. For this we declare Foo
's instance of NFData
, add some missing instances of the monoid types we use, and a helper function that forces values during the folding operation.
import Control.DeepSeq
import qualified Data.Foldable as F
import Data.Semigroup
-- Declare the data type so that each field is a monoid.
data Foo a = Foo
{ fooMin :: Option (Min a)
, fooMax :: Option (Max a)
, fooSum :: Sum a
} deriving Show
-- Make a Monoid instance - just by combining individual fields.
instance (Ord a, Num a) => Monoid (Foo a) where
mempty = Foo mempty mempty mempty
mappend (Foo n1 x1 s1) (Foo n2 x2 s2) = Foo (n1 <> n2) (x1 <> x2) (s1 <> s2)
-- Add missing NFData instances
instance (NFData a) => NFData (Option a) where
rnf (Option x) = rnf x `seq` ()
instance (NFData a) => NFData (Min a) where
rnf (Min x) = rnf x `seq` ()
instance (NFData a) => NFData (Max a) where
rnf (Max x) = rnf x `seq` ()
instance (NFData a) => NFData (Sum a) where
rnf (Sum x) = rnf x `seq` ()
-- Also add an instance for Foo
instance (NFData a) => NFData (Foo a) where
rnf (Foo n x s) = rnf n `seq` rnf x `seq` rnf s `seq` ()
-- Convert a single element into Foo.
locFoo :: a -> Foo a
locFoo x = Foo (return $ Min x) (return $ Max x) (Sum x)
-- A variant of foldMap that uses left fold and forces monoid
-- elements on the way.
foldMap' :: (F.Foldable f, Monoid m, NFData m) => (a -> m) -> f a -> m
foldMap' f = F.foldl' (\m x -> (mappend $!! m) (f x)) mempty
main :: IO()
main = do
let numItems = 2000
let numLists = 100000
putStrLn $ "numItems: " ++ show numItems
putStrLn $ "numLists: " ++ show numLists
-- Create an infinite list of lists of floats, x is [[Float]]
let x = take numLists $ repeat [1.0 .. numItems] :: [[Float]]
-- Print two first elements of each item
print $ take 2 (map (take 2) x)
-- First calculate local min/max/sum for each float list
-- then calculate the global min/max/sum based on the results.
print . foldMap' (foldMap' locFoo) $ x
Upvotes: 1