Reputation: 7435
We were given a simple task to come up with the most efficient way we can to sum all the numbers between a start and end point ('from' and 'to') using recursion and iteration respectively, without using the obvious formula which would be O(1).
There is no application for this, I am simply curious and challenged to see if my solution can be improved / polished more than it already is:
/* recursion */
unsigned int sum1(unsigned int from, unsigned int to) {
if (to - from < 2)
return from + (from == to ? 0 : to);
else
return from + to + sum1(from + 1, to - 1);
}
/* iteration */
unsigned int sum2(unsigned int from, unsigned int to) {
int p = to - from;
if (p == 0) return from;
int i, s, n = p / 2;
if (p % 2 == 0) s = n + from;
else {
s = 0;
n++;
}
for (i = 0; i < n; i++) {
s += from++ + to--;
}
return s;
}
Upvotes: 4
Views: 463
Reputation: 73
The function:
long sum(long lower, long upper) {
long s = 0;
s = ((upper * (upper + 1)) - (lower - 1) * (lower))/2;
return s;
}
called with parameters: (1,9999999) returns 49999995000000 which agrees with summation formula n(n+1)/2 and runs with the following profile on a core duo:
real 0m0.005s
user 0m0.002s
sys 0m0.003s
It might be worth checking your functions, I can't see them returning this result - the mathematical option is a far superior solution ;)
Upvotes: 0
Reputation: 56
You can use a segment tree to get the sum on the segement from i to j. This structure has O(log n) lookup time.
Upvotes: 0
Reputation: 1765
I tried improving the iterative version:
unsigned int sum2_improved(unsigned int from, unsigned int to) {
int p = to - from;
if (p == 0) return from;
int x = to + from;
int s = 0;
int i;
for (i = p >> 1; i > 0; i--)
{
s += x;
}
s += (p % 2 == 0) ? x >> 1 : x;
return s;
}
I tested your version with:
for (i = 0; i < 9999999; i++) sum2(1,999);
This is what I see:
$ time ./addm
real 0m18.315s
user 0m18.220s
sys 0m0.015s
I tried my implementation with the same number of loops. Here's how the improved function performed:
$ time ./addm
real 0m14.196s
user 0m14.070s
sys 0m0.015s
UPDATE
In my implementation x = to + from
is the sum of the first and the last number in the sequence. If you consider any consecutive sequence of integers, and sum the first and last, the second and the penultimate, and so on ... all these sum up to the same value. For example, in (1 ... 6), 1 + 6 = 2 + 5 = 3 + 4 = 7
. However, with a sequence containing odd number of elements, you are left with the middle number which you will then have to add to the cumulative sum (that's what the assignment following the for
loop was doing.
Also, note that this is still O(n)
. I realized after I initially posted my answer that my approach can actually be done in constant time. Here's the updated code:
unsigned int sum0(unsigned int from, unsigned int to) {
int p = to - from;
if (p == 0) return from;
int x = to + from;
int s = 0;
s += (x * (p >> 1));
s += (p % 2 == 0) ? x >> 1 : x;
return s;
}
I ran this with the same number of loops as the earlier tests. Here's what I saw:
$ time ./addm
real 0m0.158s
user 0m0.093s
sys 0m0.047s
I'm not sure if this can be considered a variation of the formula for your purposes. In any case, it was an interesting exercise for me.
Upvotes: 3
Reputation: 4723
Split the range (from zero to the upper limit n) in a lower and an upper half. For each value in the lower half there's a value in the upper half that's n/2 larger; there are n/2 of them, so the sum of the upper half is the sum of the lower half + (n/2)^2.
In Python that would be:
def sum1(lower, upper):
if upper <= 0:
return 0
m, r = divmod(upper, 2)
return sum1(0, m) * 2 + m * m + r * upper - sum1(0, lower - 1)
Upvotes: 1
Reputation: 29539
I am not going to write the code for it, but this is the kind of thing that would directly scale with the number of cores you have working on the task.
Dividing the range into tasks and starting a thread to sum each subsection of the range would divide the time required for whatever implementation you choose by the number of cores (give or take).
You could also use SIMD extensions to facilitate the addition (vector addition) by writing out the data in the memory before hand. Taking it to another extreme, you could actually use the GPU to calculate the addition of subranges (but you'd have to have a big enough range to make it worth the overhead), making it stupid fast; as this question is as simple as you can get with no dependencies whatsoever between the instructions.
Upvotes: 0