Anarcho-Chossid
Anarcho-Chossid

Reputation: 2330

Write a variable number of arguments for IF

I am trying to write a function that would solve any general version of this problem:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

The way I solved it for this instance was:

multiples = Array(Int, 0)
[ (i % 3 == 0 || i % 5 == 0) && push!(multiples, i) for i in 1:1000 ]
sum(multiples)

I want to write a function that will take an array of multiples (in this case, [3,5]) and the final number (in this case, 1000). The point is that the array can consist of arbitrarily many numbers, not just two (e.g., [3,5,6]). Then the function should run i % N == 0 for each N.

How do I do that most efficiently? Could that involve metaprogramming? (The code doesn't have to be in a list comprehension format.)

Thanks!

Upvotes: 2

Views: 173

Answers (2)

IainDunning
IainDunning

Reputation: 11644

Very first thing that popped into my head was the following, using modulo division and a functional style:

v1(N,M) = sum(filter(k->any(j->k%j==0,M), 1:N))

But I wanted to investigate some alternatives, as this has two problems:

  1. Two levels of anonymous function, something Julia doesn't optimize well (yet).
  2. Creates a temporary array from the range, then sums.

So here is the most obvious alternative, the C-style version of the one-liner:

function v2(N,M)
    sum_so_far = 0
    for k in 1:N
        for j in M
            if k%j==0
                sum_so_far += k
                break
            end
        end
    end
    return sum_so_far
end

But then I thought about it some more, and remembered reading somewhere that modulo division is a slow operation. I wanted to see how IntSets perform - a set specialized for integers. So here is another one-liner, IntSets without using any module division, and a functional style!

v3(N,M) = sum(union(map(j->IntSet(j:j:N), M)...))

Expanding the map into a for loop and repeatedly applying union! to a single IntSet was not much better, so I won't include that here. To break this down:

  • IntSet(j:j:N) is all the multiples of j between j and N
  • j->IntSet(j:j:N) is an anonymous function that returns that IntSet
  • map(j->IntSet(j:j:N), M) applies that function to each j in M, and returns a Vector{IntSet}.
  • The ... "splats" the vector out into arguments of union
  • union creates an IntSet that is the union of its arguments - in this case, all multiples of the numbers in M
  • Then we sum it to finish

I benchmarked these with

N,M = 10000000, [3,4,5]

which gives you

  • One-liner: 2.857292874 seconds (826006352 bytes allocated, 10.49% gc time)
  • C-style: 0.190581908 seconds (176 bytes allocated)
  • IntSet no modulo: 0.121820101 seconds (16781040 bytes allocated)

So you can definitely even beat C-style code with higher level objects - modulo is that expensive I guess! The neat thing about the no modulo one is it parallelizes quite easily:

addprocs(3)
@everywhere worker(j,N) = IntSet(j:j:N)
v4(N,M) = sum(union(pmap(j->worker(j,N),M)...))
@show v4(1000, [3,5])
@time v3(1000000000,[3,4,5]);  # bigger N than before
@time v4(1000000000,[3,4,5]);

which gives

elapsed time: 12.279323079 seconds (2147831540 bytes allocated, 0.94% gc time)
elapsed time: 10.322364457 seconds (1019935752 bytes allocated, 0.71% gc time)

which isn't much better, but its something I suppose.

Upvotes: 3

Colin T Bowers
Colin T Bowers

Reputation: 18530

Okay, here is my updated answer.

Based on the benchmarks in the answer of @IainDunning, the method to beat is his v2. My approach below appears to be much faster, but I'm not clever enough to generalize it to input vectors of length greater than 2. A good mathematician should be able to improve on my answer.

Quick intuition: For the case of length(M)=2, the problem reduces to the sum of all multiples of M[1] up to N added to the sum of all multiples of M[2] up to N, where, to avoid double-counting, we then need to subtract the sum of all multiples of M[1]*M[2] up to N. A similar algorithm could be implemented for M > 2, but the double-counting issue becomes much more complicated very quickly. I suspect a general algorithm for this would definitely exist (it is the kind of issue that crops up all the time in the field of combinatorics) but I don't know it off the top of my head.

Here is the test code for my approach (f1) versus v2:

function f1(N, M)
  if length(M) > 2
    error("I'm not clever enough for this case")
  end
  runningSum = 0
  for c = 1:length(M)
    runningSum += sum(M[c]:M[c]:N)
  end
  for c1 = 1:length(M)
    for c2 = c1+1:length(M)
      temp1 = M[c1]*M[c2]
      runningSum -= sum(temp1:temp1:N)
    end
  end
  return(runningSum)
end    

function v2(N, M)
  sum_so_far = 0
  for k in 1:N
    for j in M
      if k%j==0
        sum_so_far += k
        break
      end
    end
  end
  return sum_so_far
end

f1(1000, [3,5])
v2(1000, [3,5])

N = 10000000
M = [3,5]

@time f1(N, M)
@time v2(N, M)

Timings are:

elapsed time: 4.744e-6 seconds (96 bytes allocated)
elapsed time: 0.201480996 seconds (96 bytes allocated)

Sorry, this is an interesting problem but I'm afraid I have to get back to work :-) I'll check back in later if I get a chance...

Upvotes: 1

Related Questions