Ertuğrul Çetin
Ertuğrul Çetin

Reputation: 5231

What is the difference between Clojure clojure.core.reducers/fold and Scala fold?

I came across that Clojure has clojure.core.reducers/fold function.

Also Scala has built-in fold function but could not understand that they are working differently or not?

Upvotes: 2

Views: 537

Answers (2)

Andrey Tyukin
Andrey Tyukin

Reputation: 44967

I assume that you are talking about clojure.core.reducers/fold.

Scala's default fold implementation on sequences is very simple:

collection.fold(identityElem)(binOp)

simply starts with the identityElem and then traverses the collection sequentially, and applies the binary operation binOp to the already accumulated result and the current sequence value, e.g.

(1 to 3).fold(42000)(_ + _)

will result in 42000 + 1 + 2 + 3 = 42006.

Clojure's fold with the full signature

(r/fold n combinef reducef coll)

from the above mentioned package works in parallel in two stages. First, it splits the input into smaller groups of size n (approximately), then reduces each group using reducef, and finally combines the results of each group using combinef.

The main difference is that combinef is expected to be both a zeroary and binary at the same time (Clojure has multi-ary functions), and (combinef) (without arguments) will be invoked to produce identity elements for each partition (thus, this documentation is correct, and this documentation lies).

That is, in order to simulate Scala's fold from the above example, one would have to write something like this:

(require '[clojure.core.reducers :as r])

(r/fold 3 (fn ([] 42000) ([x y] y)) + [1 2 3])

And in general, Scala's fold

collection.fold(identityElement)(binOp)

can be emulated by reducers/fold as follows:

(r/fold collectionSize (fn ([] identityElem) ([x y] y)) binOp collection)

(note the ([x y] y) contraption that throws away the first argument, it's intentional).

I guess the interface wasn't intended to be used with any zero-binary operations that are not monoids, that's the reason why Scala's fold is so awkward to simulate using Clojure's fold. If you want something that behaves like Scala's fold, use reduce in Clojure.


EDIT

Oh, wait. The documentation actually states that

combinef must be associative, and, when called with no arguments, (combinef) must produce its identity element

that is, we are actually forced to use a monoid as the combinef, so the above 42000, ([x y] y)-example is actually invalid, and the behavior is actually undefined. The fact that I somehow got the 42006 out was a hack in the strictly technical sense that it relied on undefined behavior of a library function to obtain the desired result 42006.

Taking this extra information into account, I'm not sure whether Scala's fold can be simulated by Clojure's core.reducers/fold at all. Clojure's fold seems to be constrained to reductions with a monoid, whereas Scala's fold is closer to the general List catamorphism, at the expense of parallelism.

Upvotes: 2

Alan Thompson
Alan Thompson

Reputation: 29984

The clojure.core.reducers namespace is a specialized implementation designed for parallel processing of large datasets. You can find full docs here:

https://clojure.org/reference/reducers.

(r/fold reducef coll)
(r/fold combinef reducef coll)
(r/fold n combinef reducef coll)

r/fold takes a reducible collection and partitions it into groups of approximately n (default 512) elements. Each group is reduced using the reducef function. The reducef function will be called with no arguments to produce an identity value in each partition. The results of those reductions are then reduced with the combinef (defaults to reducef) function. When called with no arguments, (combinef) must produce its identity element - this will be called multiple times. Operations may be performed in parallel. Results will preserve order.


Until you are maxing out your machine, you should just stick to the basic reduce function:

https://clojuredocs.org/clojure.core/reduce

This is essentially the same as Scala's fold function:

(reduce + 0 [1 2 3 4 5]) => 15

where the function signature is:

(reduce <op> <init-val> <collection-to-be-reduced> )

Upvotes: 2

Related Questions