Reputation: 20928
I'm looking for an efficient way to iterate over the infinite non-decreasing sequence defined by a^2+b^2
where a
, b
are both positive integers. What I mean by iterate here, is that given an existing list of n
entries, the algorithm should efficiently (I'm hoping for O(log(m))
, where m=a^2+b^2
) find the next entry.
The start of this list is: 1^2+1^2=2, 1^2+2^2=5, 2^2+2^2=8, 1^2+3^2=10, 2^2+3^2=13, 1^2+4^2=17, ...
Here's the python code used to generate these entries (the first 100 are correct):
xs=[]
for i in range(1, 100):
for j in range(i, 100):
xs.append((i**2+j**2, i, j))
xs.sort()
I've looked at the start of the list and can't see any pattern at all. Anyone know an algorithm to do this?
[edit] Upon some searching, I found Cornacchia's algorithm which requires computing a quadratic residue. However I'm still hoping for something better, as we already know the previous numbers during iteration.
Upvotes: 6
Views: 594
Reputation: 17851
The isSumOfSquares
function returns True
if n can be written as a sum of squares greater than zero and False
otherwise, based on an algorithm from Edsgar Dijkstra's 1976 book The Discipline of Programming: x sweeps downward from the square root of n, decreasing when the sum of the squares is too large, and y sweeps upward from 1, increasing when the sum of the squares is too small.
from math import sqrt
def isSumOfSquares(n):
x, y = int(sqrt(n)), 1
while x >= y:
if x*x + y*y < n: y = y + 1
elif x*x + y*y > n: x = x - 1
else: return True
return False
filter(isSumOfSquares, range(1,100))
This returns the list [2, 5, 8, 10, 13, 17, 18, 20, 25, 26, 29, 32, 34, 37, 40, 41, 45, 50, 52, 53, 58, 61, 65, 68, 72, 73, 74, 80, 82, 85, 89, 90, 97, 98]. I discussed a similar question at my blog.
Upvotes: 2
Reputation: 27599
In thinking about this I've assumed two things. The first is that for any a^2+b^2 that a>b. This is just because trivially (a,b) is in the same place in the ordering as (b,a) so you don't need to consider both.
Also the explanations are easier if a or b is allowed to be 0. These cases can easily be taken out in the real algorithm.
My thinking is that you should be able to do something based on looking at m=a+b
and saying that for a given m the smallest sum of squares is m^2/2
(when they are equal) and the largest is (m-1)^2
with the obvious ordering between the two. You could then do a sort of merge sort between these to find the next one.
My thinking is that if you look at m=a+b
then for any given m you can create an ordered list of the sums of squares for all a+b=m
. The reason is that we know that the sum is a maximum when a
or b
is 0 and a minimum when a=b=m/2
. This means we can trivially generate a series of lists of ordered sums of squares L(m)
.
We also know the maximum and minimum entries in L(m) so we don't need to start considering those lists until we are in that range.
It will still mean potentially a lot of list to consider at higher values but it is still better than a more brute force approach.
I'm not sure if there is a better way once you have these list though to work out which one you need to select from. I think probably not though you might be able to make some guesses based on the distributions of numbers in the lists but I can't think of a way to formalise them in a nice way.
To give a more visual example of things then
List Lowest value Highest value
L(0) 0 (0,0) 0 (0,0)
L(1) 1 (1,0) 1 (1,0)
L(2) 2 (1,1) 4 (2,0)
L(3) 5 (2,1) 9 (3,0)
L(4) 8 (2,2) 16 (4,0)
L(5) 13 (3,2) 25 (5,0)
L(6) 18 (3,3) 36 (6,0)
L(7) 25 (4,3) 49 (7,0)
L(8) 32 (4,4) 64 (8,0)
So to start at 0 you only need to consider items in L(0), then that is exhausted and you move onto L(1). Then that is exhausted, etc. THen while working through L(3) you find that you are looking at the last of L(3) is 9 which is greater than the lowest in L(4) so you look at both those lists.
When you have just put in (5,0) and (4,3) for 25 you then will be considering things in L(6) and L(7). As you go higher you will be considering more lists as the lists start to contain more items over a larger range (eg the number 10,000 is in the range for L(100) (just about) and L(141) (which starts at 9941).
I should also note that you don't need to calculate the whole list in advance, you can easily make it a lazy iterator to calculate only the number at the front of the list since that is the only one you ever want.
Upvotes: 0