Reputation: 323

Clojure - more effective 'min-by' function and performance analysis

I'm newbie in Clojure, I started to learn the language 2 months before. I'm reading the "The joy of clojure" book and I found a min-by function in Functional programming topic. I was thinking and I've done my min-by function, which seems at least 50% better performance at leat 10.000 items. Here are the functions

; the test vector with random data
(def my-rand-vec (vec (take 10000 (repeatedly #(rand-int 10000)))))

; the joy of clojure min-by
(defn min-by-reduce [f coll]
  (when (seq coll)
        (reduce (fn [min other]
                    (if (> (f min) (f other))
                        other
                        min))
                      coll)))

(time (min-by-reduce eval  my-rand-vec))

; my poor min-by 
(defn min-by-sort [f coll]
  (first (sort (map f coll))))

(time (min-by-sort eval my-rand-vec))

terminal output is

"Elapsed time: 91.657505 msecs"
"Elapsed time: 62.441513 msecs"

Does any performance or resource drawbacks of my solution? I'm really curious for more elegant clojure solution from clojure Gurus for this function.

EDIT

a more clear test code with criteria.

(ns min-by.core
  (:gen-class))

(use 'criterium.core)

(defn min-by-reduce [f coll]
  (when (seq coll)
    (reduce (fn [min other]
                (if (> (f min) (f other))
                    other
                    min))
                  coll)))


(defn min-by-sort [f coll]
  (first (sort-by f coll)))

(defn my-rand-map [length]
  (map #(hash-map :resource %1 :priority %2) 
      (take length (repeatedly #(rand-int 200)))
      (take length (repeatedly #(rand-int 10)))))


(defn -main
  [& args]
  (let [rand-map (my-rand-map 100000)]
  (println "min-by-reduce-----------")
  (quick-bench (min-by-reduce :resource rand-map))
  (println "min-by-sort-------------")
  (quick-bench (min-by-sort :resource rand-map))
  (println "min-by-min-key----------")
  (quick-bench (apply min-key :resource rand-map)))
 )

Terminal output is:

min-by-reduce-----------
            Evaluation count : 60 in 6 samples of 10 calls.
         Execution time mean : 11,366539 ms
Execution time std-deviation : 2,045752 ms
Execution time lower quantile : 9,690590 ms ( 2,5%)
Execution time upper quantile : 14,763746 ms (97,5%)
               Overhead used : 3,292762 ns

Found 1 outliers in 6 samples (16,6667 %)
    low-severe       1 (16,6667 %)
Variance from outliers : 47,9902 % Variance is moderately inflated by outliers

min-by-sort-------------
             Evaluation count : 6 in 6 samples of 1 calls.
          Execution time mean : 174,747463 ms
 Execution time std-deviation : 18,431608 ms
Execution time lower quantile : 158,138543 ms ( 2,5%)
Execution time upper quantile : 203,420044 ms (97,5%)
                Overhead used : 3,292762 ns

Found 1 outliers in 6 samples (16,6667 %)
    low-severe       1 (16,6667 %)
Variance from outliers : 30,7324 % Variance is moderately inflated by outliers

min-by-min-key----------
             Evaluation count : 36 in 6 samples of 6 calls.
          Execution time mean : 17,405529 ms
 Execution time std-deviation : 1,661902 ms
Execution time lower quantile : 15,962259 ms ( 2,5%)
Execution time upper quantile : 19,366893 ms (97,5%)
                Overhead used : 3,292762 ns

Upvotes: 1

Answers (4)

Thumbnail

Reputation: 13483

The standard min-key needs only a little adapting:

(defn min-by [f coll]
  (when (seq coll)
    (apply min-key f coll)))

if you look at the source code for min-key, this is essentially the same as JoC's min-by-reduce.

Upvotes: 0

tkircsi

Reputation: 323

I've changed the min-by-sort function. The whole picture is changed.

(defn min-by-sort [f coll]
  (first (sort-by f coll)))

The terminal output for the 2 functions for 10.000 items is

"Elapsed time: 0.863016 msecs"
"Elapsed time: 11.44852 msecs"

So, the question, is there a better or more elegant min-by-xxx function, which finds the minimum in a collection according to a function and returns the original value?

And a final test with map and keyword like function.

; find (f min) by reduce
(defn min-by-reduce [f coll]
  (when (seq coll)
    (reduce (fn [min other]
                (if (> (f min) (f other))
                    other
                    min))
                  coll)))

; find (f min) by sort-by
(defn min-by-sort [f coll]
  (first (sort-by f coll)))

;a  helper function to build a sequence of {:resource x, :priority y} maps
(defn my-rand-map [length]
  (map #(hash-map :resource %1 :priority %2) 
      (take length (repeatedly #(rand-int 200)))
      (take length (repeatedly #(rand-int 10)))))

; test with 100 items in the seq
(let [rand-map (my-rand-map 100)]
 (time (min-by-reduce :resource rand-map))
 (time (min-by-sort :resource rand-map)))

Test 100 items
"Elapsed time: 0.245403 msecs" "Elapsed time: 0.18094 msecs"

Test 1000 items
"Elapsed time: 2.653952 msecs" "Elapsed time: 3.214373 msecs"

Test 10.000 items
"Elapsed time: 14.275679 msecs" "Elapsed time: 38.064996 msecs"

I think, the difference is sort-by of course order the items to, but reduce just walk through the items and accumulate the actual minimum. Is it true?

Upvotes: 0

mike3996

Reputation: 17537

First, your version returns (f min) instead of min, and on theoretical grounds finding a minimum is linear O(n) operation while sorting and taking the first is quasilinear O(n log n). For small vectors it may be difficult to get accurate timing results, and for that matter, time complexities don't guarantee that quasilinear operations are always slower than linear!

Try with sample sizes of 1,000,000 or more, and use more complext key functions. For instance, generate sample strings and use length or similar to sort by. That way you get more real-world-like results.

Tip: instead of eval you can use identity to "skip" giving a function for your test purposes. Won't probably affect the benchmark much but just so you are aware of the function.

As user ClojureMostly pointed out, eval is a big bottleneck and skewed the benchmark towards the wrong conclusion.

Upvotes: 1

Alan Thompson

Reputation: 29984

I believe the JoC was trying to illustrate the use of reduce, nothing more.

I too read JoC as my first book, but I wish I had saved it until I had gone through more introductory books first. There are quite a few good ones out there. You can even read (most of) Clojure for the Brave and True online: http://www.braveclojure.com/clojure-for-the-brave-and-true/ I also recommend buying the full hardcopy version.

You should also check out the Clojure Cookbook: https://github.com/clojure-cookbook/clojure-cookbook As before, I also recommend buying the full hardcopy version.

Upvotes: 0

Clojure - more effective &#39;min-by&#39; function and performance analysis

Answers (4)

Related Questions

Clojure - more effective 'min-by' function and performance analysis