Reputation: 4698
I read from Foldr Foldl Foldl' that foldl'
is more efficient for long finite lists because of the strictness property. I am aware that it is not suitable for infinite list.
Thus, I am limiting the comparison only for long finite lists.
concatMap
is implemented using foldr
, which gives it laziness. However, using it with long finite lists will build up a long unreduced chain according to the article.
concatMap :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap f xs = build (\c n -> foldr (\x b -> foldr c b (f x)) n xs)
Thus I come up with the following implementation with use of foldl'
.
concatMap' :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap' f = reverse . foldl' (\acc x -> f x ++ acc) []
I have build the following two functions to test out the performance.
lastA = last . concatMap (: []) $ [1..10000]
lastB = last . concatMap' (: []) $ [1..10000]
However, I was shocked by the results.
lastA:
(0.23 secs, 184,071,944 bytes)
(0.24 secs, 184,074,376 bytes)
(0.24 secs, 184,071,048 bytes)
(0.24 secs, 184,074,376 bytes)
(0.25 secs, 184,075,216 bytes)
lastB:
(0.81 secs, 224,075,080 bytes)
(0.76 secs, 224,074,504 bytes)
(0.78 secs, 224,072,888 bytes)
(0.84 secs, 224,073,736 bytes)
(0.79 secs, 224,074,064 bytes)
concatMap
outcompetes my concatMap'
in both time and memory. I wonder there are mistakes I made in my concatMap'
implementation.
Thus, I doubt the articles for stating the goodness of foldl'
.
Are there any black magic in concatMap
to make it so efficient?
Is it true that foldl'
is more efficient for long finite list?
Is it true that using foldr
with long finite lists will build up a long unreduced chain and impact the performance?
Upvotes: 2
Views: 952
Reputation: 116139
Are there any black magic in concatMap to make it so efficient?
No, not really.
Is it true that
foldl'
is more efficient for long finite list?
Not always. It depends on the folding function.
The point is, foldl
and foldl'
always have to scan the whole input list before producing the output. Instead, foldr
does not always have to.
As an extreme case, consider
foldr (\x xs -> x) 0 [10..10000000]
which evaluates to 10
instantly -- only the first element of the list is evaluated. The reduction goes something like
foldr (\x xs -> x) 0 [10..10000000]
= foldr (\x xs -> x) 0 (10 : [11..10000000])
= (\x xs -> x) 10 (foldr (\x xs -> x) 0 [11..10000000])
= (\xs -> 10) (foldr (\x xs -> x) 0 [11..10000000])
= 10
and the recursive call is not evaluated thanks to laziness.
In general, when computing foldr f a xs
, it is important to check whether f y ys
is able to construct a part of the output before evaluating ys
. For instance
foldr f [] xs
where f y ys = (2*y) : ys
produces a list cell _ : _
before evaluating 2*y
and ys
. This makes it an excellent candidate for foldr
.
Again, we can define
map f xs = foldr (\y ys -> f y : ys) [] xs
which runs just fine. It consumes one element from xs
and outputs the first output cell. Then it consumes the next element, outputs the next element, and so on. Using foldl'
would not output anything until the whole list is processed, making the code quite inefficient.
Instead, if we wrote
sum xs = foldr (\y ys -> y+ys) 0 xs
then we do not output anything after the first element of xs
is consumed.
We build a long chain of thunks, wasting a lot of memory.
Here, foldl'
would instead work in constant space.
Is it true that using
foldr
with long finite lists will build up a long unreduced chain and impact the performance?
Not always. It strongly depends on how the output is consumed by the caller.
As a thumb rule, if the output is "atomic", meaning that the output consumer can not observe only a part of it (e.g. Bool, Int, ...
) then it's better to use foldl'
. If the output is "composed" of many independent values (list, trees, ...) probably foldr
is a better choice, if f
can produce its output step-by-step, in a "streaming" fashion.
Upvotes: 6