Reputation: 323
I'm newbie in Clojure, I started to learn the language 2 months before. I'm reading the "The joy of clojure" book and I found a min-by function in Functional programming topic. I was thinking and I've done my min-by function, which seems at least 50% better performance at leat 10.000 items. Here are the functions
; the test vector with random data
(def my-rand-vec (vec (take 10000 (repeatedly #(rand-int 10000)))))
; the joy of clojure min-by
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
(time (min-by-reduce eval my-rand-vec))
; my poor min-by
(defn min-by-sort [f coll]
(first (sort (map f coll))))
(time (min-by-sort eval my-rand-vec))
terminal output is
"Elapsed time: 91.657505 msecs"
"Elapsed time: 62.441513 msecs"
Does any performance or resource drawbacks of my solution? I'm really curious for more elegant clojure solution from clojure Gurus for this function.
EDIT
a more clear test code with criteria.
(ns min-by.core
(:gen-class))
(use 'criterium.core)
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
(defn min-by-sort [f coll]
(first (sort-by f coll)))
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
(defn -main
[& args]
(let [rand-map (my-rand-map 100000)]
(println "min-by-reduce-----------")
(quick-bench (min-by-reduce :resource rand-map))
(println "min-by-sort-------------")
(quick-bench (min-by-sort :resource rand-map))
(println "min-by-min-key----------")
(quick-bench (apply min-key :resource rand-map)))
)
Terminal output is:
min-by-reduce-----------
Evaluation count : 60 in 6 samples of 10 calls.
Execution time mean : 11,366539 ms
Execution time std-deviation : 2,045752 ms
Execution time lower quantile : 9,690590 ms ( 2,5%)
Execution time upper quantile : 14,763746 ms (97,5%)
Overhead used : 3,292762 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 47,9902 % Variance is moderately inflated by outliers
min-by-sort-------------
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 174,747463 ms
Execution time std-deviation : 18,431608 ms
Execution time lower quantile : 158,138543 ms ( 2,5%)
Execution time upper quantile : 203,420044 ms (97,5%)
Overhead used : 3,292762 ns
Found 1 outliers in 6 samples (16,6667 %)
low-severe 1 (16,6667 %)
Variance from outliers : 30,7324 % Variance is moderately inflated by outliers
min-by-min-key----------
Evaluation count : 36 in 6 samples of 6 calls.
Execution time mean : 17,405529 ms
Execution time std-deviation : 1,661902 ms
Execution time lower quantile : 15,962259 ms ( 2,5%)
Execution time upper quantile : 19,366893 ms (97,5%)
Overhead used : 3,292762 ns
Upvotes: 1
Views: 645
Reputation: 13483
The standard min-key
needs only a little adapting:
(defn min-by [f coll]
(when (seq coll)
(apply min-key f coll)))
if you look at the source code for min-key
, this is essentially the same as JoC's min-by-reduce
.
Upvotes: 0
Reputation: 323
I've changed the min-by-sort
function. The whole picture is changed.
(defn min-by-sort [f coll]
(first (sort-by f coll)))
The terminal output for the 2 functions for 10.000 items is
"Elapsed time: 0.863016 msecs"
"Elapsed time: 11.44852 msecs"
So, the question, is there a better or more elegant min-by-xxx function, which finds the minimum in a collection according to a function and returns the original value?
And a final test with map and keyword like function.
; find (f min) by reduce
(defn min-by-reduce [f coll]
(when (seq coll)
(reduce (fn [min other]
(if (> (f min) (f other))
other
min))
coll)))
; find (f min) by sort-by
(defn min-by-sort [f coll]
(first (sort-by f coll)))
;a helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
(map #(hash-map :resource %1 :priority %2)
(take length (repeatedly #(rand-int 200)))
(take length (repeatedly #(rand-int 10)))))
; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
(time (min-by-reduce :resource rand-map))
(time (min-by-sort :resource rand-map)))
Test 100 items
"Elapsed time: 0.245403 msecs"
"Elapsed time: 0.18094 msecs"
Test 1000 items
"Elapsed time: 2.653952 msecs"
"Elapsed time: 3.214373 msecs"
Test 10.000 items
"Elapsed time: 14.275679 msecs"
"Elapsed time: 38.064996 msecs"
I think, the difference is sort-by of course order the items to, but reduce just walk through the items and accumulate the actual minimum. Is it true?
Upvotes: 0
Reputation: 17537
First, your version returns (f min)
instead of min
, and on theoretical grounds finding a minimum is linear O(n)
operation while sorting and taking the first is quasilinear O(n log n)
. For small vectors it may be difficult to get accurate timing results, and for that matter, time complexities don't guarantee that quasilinear operations are always slower than linear!
Try with sample sizes of 1,000,000 or more, and use more complext key functions. For instance, generate sample strings and use length
or similar to sort by. That way you get more real-world-like results.
Tip: instead of eval
you can use identity
to "skip" giving a function for your test purposes. Won't probably affect the benchmark much but just so you are aware of the function.
As user ClojureMostly pointed out, eval
is a big bottleneck and skewed the benchmark towards the wrong conclusion.
Upvotes: 1
Reputation: 29984
I believe the JoC was trying to illustrate the use of reduce
, nothing more.
I too read JoC as my first book, but I wish I had saved it until I had gone through more introductory books first. There are quite a few good ones out there. You can even read (most of) Clojure for the Brave and True online: http://www.braveclojure.com/clojure-for-the-brave-and-true/ I also recommend buying the full hardcopy version.
You should also check out the Clojure Cookbook: https://github.com/clojure-cookbook/clojure-cookbook As before, I also recommend buying the full hardcopy version.
Upvotes: 0