Reputation: 938
I have a problem where I need to (pretty sure at least) go through the entire list to solve. The question is to figure out the largest number of consecutive numbers in a list that add up to another (greater) element in that list. If there aren't any then we just take the largest value in the list as the candidate summation and 1 as the largest consecutive number of elements.
My general code works, but not too well for large lists (>500,000 elements). I am just looking for tips as to how I could approach the problem differently. My current approach:
L = [1,2,3,4,5,6,7,8,9,10]
candidate_sum = L[-1]
largest_count = 1
N = len(L)
i = 0
while i < N - 1:
s = L[i]
j = 0
while s <= (N - L[i + j + 1]):
j += 1
s += L[i+j]
if s in L and (j+1) > largest_count:
largest_count = j+1
candidate_sum = s
i+=1
In this case, the answer would be [1,2,3,4] as they add up to 10 and the length is 4 (obviously this example L is a very simple example).
I then made it faster by changing the initial while loop condition to:
while i < (N-1)/largest_count
Not a great assumption, but basic thinking that the distribution of numbers is somewhat uniform, so two numbers on the second half of the list are on average bigger than the final number in the list, and therefore are disqualified.
I'm just looking for:
Upvotes: 5
Views: 299
Reputation:
Strictly ascending: no duplication of elements or subsequences, single possible solution
Arbitrary-spaced: no arithmetical shortcuts, has to operate brute-force
Efficient C implementation using pointer arithmetic, quasi polymorphic over numeric types:
#define TYPE int
int max_subsum(TYPE arr [], int size) {
int max_length = 1;
TYPE arr_fst = * arr;
TYPE* num_ptr = arr;
while (size --) {
TYPE num = * num_ptr++;
TYPE* lower = arr;
TYPE* upper = arr;
TYPE sum = arr_fst;
int length = 1;
for (;;) {
if (sum > num) {
sum -= * lower++;
-- length;
}
else if (sum < num) {
sum += * ++upper;
++ length;
}
else {
if (length > max_length) {
max_length = length;
}
break;
}
}
}
return max_length;
}
The main loop over the num
s is parallelizable. Relatively straight-forward translation into Python 3 using the dynamic-array list type for arr
and the for each
loop:
def max_subsum(arr):
max_len = 1
arr_fst = arr[0]
for n in arr:
lower = 0
upper = 0
sum = arr_fst
while True:
if sum > n:
sum -= arr[lower]
lower += 1
elif sum < n:
upper += 1
sum += arr[upper]
else:
sum_len = upper - lower + 1
if sum_len > max_len:
max_len = sum_len
break
return max_len
This max_subsum
is a partial function; Python lists can be empty. The algorithm is appropriate for C-like compiled imperative languages offering fast indexing and statically typed arithmetic. Both are comparatively expensive in Python. A (totally defined) algorithm rather similar to yours, using the set
data type for more performant universal quantification, and avoiding Python's dynamically typed arithmetic, can be more efficiently interpreted:
def max_subsum(arr):
size = len(arr)
max_len = 0
arr_set = set(arr)
for i in range(size):
sum = 0
sum_len = 0
for j in range(i, size):
sum_mem = sum + arr[j]
if num_mem not in arr_set:
break
sum = sum_mem
sum_len += 1
if sum_len > max_len:
max_len = sum_len
return max_len
Upvotes: 4
Reputation: 11075
I'm going to ignore the possibility of a changing target value, and let you figure that out, but to answer your question "is there a faster way to do it?" Yes: by using cumulative sums and some math to eliminate one of your loops.
import numpy as np
L = np.random.randint(0,100,100)
L.sort()
cum_sum = np.cumsum(L)
start = 0
end = 0
target = 200
while 1:
total = cum_sum [end-1] - (cum_sum [start-1] if start else 0)
if total == target:
break
elif total < target:
end += 1
elif total > target:
start += 1
if end >= len(L):
raise ValueError('something informative')
Upvotes: 2