Nick
Nick

Reputation: 9031

How to build a lazy sequence of perfect number in Clojure?

I tried to find a list of perfect number in this way:

(defn classify [num] 
  (let [factors (->> (range 1 (inc num))
                     (filter #(zero? (rem num %))))
        sum (reduce + factors)
        aliquot-sum (- sum num)]
    (cond
      (= aliquot-sum num) :perfect
      (> aliquot-sum num) :abundant
      (< aliquot-sum num) :deficient)))

(defn is-p [n] 
  (= :perfect (classify n)))

(defn list-perfect [n]
  (filter is-p (range 1 (inc n))))

Question:

  1. How to build a lazy sequence of perfect numbers, so that I can use (take n ...) to easily get a list.

  2. Is this code idiomatic and efficient? Any improvement?

Thanks in advance.

Upvotes: 1

Views: 347

Answers (2)

Mark Fisher
Mark Fisher

Reputation: 9886

Your algorithm is very inefficient, it's O(n).

For a quick win, you can immediately reduce the range by a half, as you won't ever have factors that are greater than the number you're testing divided by 2.

So change it to:

(defn classify [num] 
  (let [factors (->> (range 1 (max 2 (inc (quot num 2))))
  ;; ...

However... you can change it to O(sqrt n) which is magnitudes faster. See my timings below.

The real efficiency is noticing that factors are in pairs of [x (quot num x)] and then only check the first (sqrt num) (or slightly over):

(defn perfect? [n]
  (let [r (range 2 (+ 2 (int (Math/sqrt n))))
        s (set (mapcat #(if (zero? (rem n %))
                          [% (quot n %)])
                       r))
        t (reduce + 1 s)]
    (= n t)))

I've split it into separate calculations so you can verify each stage.

The range can be reduced from 2..((sqrt n) + 2), and initialise the reduction with 1 (which is always a factor).

This changes the problem from an O(n) to O(sqrt n), so if you're checking large numbers, makes a vast difference.

As an illustration here are some times on larger values for n on my MBP:

         n            "n/2"      "sqrt n"
      33550336       1,172.5ms     2.85ms
    8589869056     274,346.6ms    16.76ms
  137438691328     didn't time    44.27ms

so using root version was 16,369 times faster for the 6th perfect number. See http://en.wikipedia.org/wiki/List_of_perfect_numbers for more details.

EDIT:

Why (int (root n)) + 2? And why `[x (quot n x)]?

When you work out the factors of a number n, then if you find one (say, a), then n/a is also a factor (call it b) because n = a * b

e.g. looking at 28, the first relevant factor is 2, and clearly 28/2 = 14 is also a factor. So you don't need to check 14, you already know it's a factor from the fact that 2 is.

as we're incrementally checking numbers from 2 upwards, we're incidentally finding the higher numbers coming down:

2 is a factor, 28 / 2 = 14 -> [a, b] = [2, 14]
4 is a factor, 28 / 4 = 7  -> [a, b] = [4,  7]
7 is a factor, 28 / 7 = 4  -> [a, b] = [7,  4] - wait a minute...

The [a,b] pairs here are the [% (quot n %)] in the mapcat function, e.g. when the range is currently iterating the value 2, then % is 2 inside the fuction, and so (quot n %) is (quot 28 2) which is 14, thus [% (quot n %)] is simply the vector [2 14], which then gets added to the set after being flattened to 2 and 14 as values. Later, when the range value is 4, then [% (quot n %)] is [4 (quot 28 4)] which is [4 7], and again is flattened by mapcat as the numbers 4 and 7.

So we add each pair of numbers (flattened via mapcat) to our set, include the number 1, and end up with #{1 2 14 4 7}, which are the factors of 28. (Actually, I don't put 1 in the set as I don't need to, instead I start the summing reduction at 1, which is same effect).

But at what point do they turn around? i.e. when do we know that [7,4] will already have been included in the set as [4,7]?

clearly it's when a > b, because in finding the lowest numbers we always find the highest number with it, so we can finish checking at this point.

but what is this point? it's simple, if a perfect number were a square number, then a and b would be equal, i.e. a*a = n, so a = sqrt(n).

Thus the highest value of a we need to check is the whole number that is larger than the root of n.

e.g. for 28, sqrt(28) = 5.292, so we have to check 6 to be sure that we've included the lowest number possible that could be a factor that has a paired factor.

so we need (int (sqrt n)) + 1.

I always do this in case the root calculation is 1.9999999999... and rounds wrong way, so adding 1 more ensures you eliminate any rounding errors.

but in a range, if you want to include that number you have to add 1 more to it (range drops the high number, (range 6) = (0 1 2 3 4 5)), hence why it adds 2 to the value: 1 for the range, 1 to ensure it's above the rounded down root.

Although, after saying this, I've tested perfect numbers up to 2305843008139952128 and it works with +1 instead of +2, not that it's a massive saving. Probably because non of the perfect numbers are close to perfect squares in ones I checked, so there's no rounding error in (int (sqrt n)).

If you're interested in perfect numbers, I'd recommend reading http://britton.disted.camosun.bc.ca/perfect_number/lesson1.html

Upvotes: 2

schaueho
schaueho

Reputation: 3504

list-perfect is already lazy due to your usage of filter:

(filter pred coll)

Returns a lazy sequence of the items in coll for which (pred item) returns true. pred must be free of side-effects.

Whether code is idiomatic or not might be a matter of opinion (and hence off-topic), but it looks good enough from my perspective.

Upvotes: 1

Related Questions