Reputation: 2330
I am trying to write a function that would solve any general version of this problem:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
The way I solved it for this instance was:
multiples = Array(Int, 0)
[ (i % 3 == 0 || i % 5 == 0) && push!(multiples, i) for i in 1:1000 ]
sum(multiples)
I want to write a function that will take an array of multiples (in this case, [3,5]) and the final number (in this case, 1000). The point is that the array can consist of arbitrarily many numbers, not just two (e.g., [3,5,6]). Then the function should run i % N == 0
for each N
.
How do I do that most efficiently? Could that involve metaprogramming? (The code doesn't have to be in a list comprehension format.)
Thanks!
Upvotes: 2
Views: 173
Reputation: 11644
Very first thing that popped into my head was the following, using modulo division and a functional style:
v1(N,M) = sum(filter(k->any(j->k%j==0,M), 1:N))
But I wanted to investigate some alternatives, as this has two problems:
So here is the most obvious alternative, the C-style version of the one-liner:
function v2(N,M)
sum_so_far = 0
for k in 1:N
for j in M
if k%j==0
sum_so_far += k
break
end
end
end
return sum_so_far
end
But then I thought about it some more, and remembered reading somewhere that modulo division is a slow operation. I wanted to see how IntSet
s perform - a set specialized for integers. So here is another one-liner, IntSet
s without using any module division, and a functional style!
v3(N,M) = sum(union(map(j->IntSet(j:j:N), M)...))
Expanding the map into a for
loop and repeatedly applying union!
to a single IntSet
was not much better, so I won't include that here. To break this down:
IntSet(j:j:N)
is all the multiples of j between j and Nj->IntSet(j:j:N)
is an anonymous function that returns that IntSet
map(j->IntSet(j:j:N), M)
applies that function to each j in M, and returns a Vector{IntSet}
....
"splats" the vector out into arguments of union
union
creates an IntSet
that is the union of its arguments - in this case, all multiples of the numbers in MI benchmarked these with
N,M = 10000000, [3,4,5]
which gives you
2.857292874 seconds (826006352 bytes allocated, 10.49% gc time)
0.190581908 seconds (176 bytes allocated)
0.121820101 seconds (16781040 bytes allocated)
So you can definitely even beat C-style code with higher level objects - modulo is that expensive I guess! The neat thing about the no modulo one is it parallelizes quite easily:
addprocs(3)
@everywhere worker(j,N) = IntSet(j:j:N)
v4(N,M) = sum(union(pmap(j->worker(j,N),M)...))
@show v4(1000, [3,5])
@time v3(1000000000,[3,4,5]); # bigger N than before
@time v4(1000000000,[3,4,5]);
which gives
elapsed time: 12.279323079 seconds (2147831540 bytes allocated, 0.94% gc time)
elapsed time: 10.322364457 seconds (1019935752 bytes allocated, 0.71% gc time)
which isn't much better, but its something I suppose.
Upvotes: 3
Reputation: 18530
Okay, here is my updated answer.
Based on the benchmarks in the answer of @IainDunning, the method to beat is his v2
. My approach below appears to be much faster, but I'm not clever enough to generalize it to input vectors of length greater than 2. A good mathematician should be able to improve on my answer.
Quick intuition: For the case of length(M)=2
, the problem reduces to the sum of all multiples of M[1]
up to N
added to the sum of all multiples of M[2]
up to N
, where, to avoid double-counting, we then need to subtract the sum of all multiples of M[1]*M[2]
up to N
. A similar algorithm could be implemented for M > 2
, but the double-counting issue becomes much more complicated very quickly. I suspect a general algorithm for this would definitely exist (it is the kind of issue that crops up all the time in the field of combinatorics) but I don't know it off the top of my head.
Here is the test code for my approach (f1
) versus v2
:
function f1(N, M)
if length(M) > 2
error("I'm not clever enough for this case")
end
runningSum = 0
for c = 1:length(M)
runningSum += sum(M[c]:M[c]:N)
end
for c1 = 1:length(M)
for c2 = c1+1:length(M)
temp1 = M[c1]*M[c2]
runningSum -= sum(temp1:temp1:N)
end
end
return(runningSum)
end
function v2(N, M)
sum_so_far = 0
for k in 1:N
for j in M
if k%j==0
sum_so_far += k
break
end
end
end
return sum_so_far
end
f1(1000, [3,5])
v2(1000, [3,5])
N = 10000000
M = [3,5]
@time f1(N, M)
@time v2(N, M)
Timings are:
elapsed time: 4.744e-6 seconds (96 bytes allocated)
elapsed time: 0.201480996 seconds (96 bytes allocated)
Sorry, this is an interesting problem but I'm afraid I have to get back to work :-) I'll check back in later if I get a chance...
Upvotes: 1