Reputation: 93
When this code runs, OSX notifies me that I'm out of application memory and pauses the application. The amount of space that is used by Python breaks 10 gigs very quickly. This code never hits Python's max levels of recursion, it should only hit 525 worst case, but due to caching it should be much smaller. I have a feeling that the list chain is getting copied with every level of recursion, but it seems like it's a global variable that should be shared with every call of collatz(). I've looked for similar questions on stackoverflow, but I haven't found any that are the same.
# The following iterative sequence is defined for the set of positive integers:
# n = n/2 (n is even)
# = 3n+1 (n is odd)
# Using the rule above and starting with 13, we generate the following sequence:
# 13,40,20,10,5,16,8,4,2,1
# It can be seen that this sequence (starting at 13 and finishing at 1) contains 10
# terms. Although it has not been proved yet (Collatz Problem), it is thought that
# all starting numbers finish at 1.
# Which starting number, under one million, produces the longest chain?
# Comments: I'm using dynamic programming to avoid computing the same values over and over
# again.
lim = 1000000
chain = [1,1]
maxL = 0
def collatz(i):
if i >= len(chain): chain.extend([None]*(i-len(chain)+1))
if chain[i] == None: chain[i] = 1 + collatz(i/2 if i%2==0 else 3*i+1)
return chain[i]
for i in range(1, lim):
maxL = i if (collatz(i) > chain[maxL]) else maxL
print i, chain[maxL]
print "collatz chain of {} is {} terms long.".format(maxL, collatz(maxL))
EDIT: See my working dictionary implementation here: https://stackoverflow.com/a/20229855/1084773.
Upvotes: 3
Views: 1487
Reputation: 93
The list implementation used a horrible amount of extra memory (as pointed out by Hyperboreus, thanks!). Switching to a dictionary reduced the memory usage from ~212GB to ~70MB with 32byte ints and solved the problem in two seconds.
lim = 1000000
chain = {1:1}
maxL = 1
def collatz(i):
if i not in chain: chain[i] = 1 + collatz(i/2 if i%2==0 else 3*i+1)
return chain[i]
for i in range(1, lim): maxL = i if (collatz(i) > chain[maxL]) else maxL
print "collatz chain of {} is {} terms long.".format(maxL, collatz(maxL))
Upvotes: 1
Reputation: 32429
To see your memory error, run your code with limit = 100 and then print out chain.
Maybe you want to serialize your recursive code:
lengths = {1: 1}
def collatz(i):
i0 = i
acc = 0
while True:
if i in lengths:
lengths[i0] = acc + lengths[i]
return acc + lengths[i]
acc += 1
i = (i * 3 + 1) if i % 2 else i // 2
longest = 1
for i in range(1, 1000000):
c = collatz(i)
if c > longest:
longest = c
print(i, c)
This surely can still be optimized in many ways, but it yields the expected result in 4 seconds.
EDIT:
Your approach creates a list with a length of the highest term ever reached. For limit = 100, this is 9232. Which is not that much. But for limit = 1000000 it is 56991483520 (the chain starting with 704511), which is quite a whopper. If it were a mere array over int32, this would already be 212 GB of memory, and in reality it is more than that.
Here the troublesome chain: 704511, 2113534, 1056767, 3170302, 1585151, 4755454, 2377727, 7133182, 3566591, 10699774, 5349887, 16049662, 8024831, 24074494, 12037247, 36111742, 18055871, 54167614, 27083807, 81251422, 40625711, 121877134, 60938567, 182815702, 91407851, 274223554, 137111777, 411335332, 205667666, 102833833, 308501500, 154250750, 77125375, 231376126, 115688063, 347064190, 173532095, 520596286, 260298143, 780894430, 390447215, 1171341646, 585670823, 1757012470, 878506235, 2635518706, 1317759353, 3953278060, 1976639030, 988319515, 2964958546, 1482479273, 4447437820, 2223718910, 1111859455, 3335578366, 1667789183, 5003367550, 2501683775, 7505051326, 3752525663, 11257576990, 5628788495, 16886365486, 8443182743, 25329548230, 12664774115, 37994322346, 18997161173, 56991483520, 28495741760, 14247870880, 7123935440, 3561967720, 1780983860, 890491930, 445245965, 1335737896, 667868948, 333934474, 166967237, 500901712, 250450856, 125225428, 62612714, 31306357, 93919072, 46959536, 23479768, 11739884, 5869942, 2934971, 8804914, 4402457, 13207372, 6603686, 3301843, 9905530, 4952765, 14858296, 7429148, 3714574, 1857287, 5571862, 2785931, 8357794, 4178897, 12536692, 6268346, 3134173, 9402520, 4701260, 2350630, 1175315, 3525946, 1762973, 5288920, 2644460, 1322230, 661115, 1983346, 991673, 2975020, 1487510, 743755, 2231266, 1115633, 3346900, 1673450, 836725, 2510176, 1255088, 627544, 313772, 156886, 78443, 235330, 117665, 352996, 176498, 88249, 264748, 132374, 66187, 198562, 99281, 297844, 148922, 74461, 223384, 111692, 55846, 27923, 83770, 41885, 125656, 62828, 31414, 15707, 47122, 23561, 70684, 35342, 17671, 53014, 26507, 79522, 39761, 119284, 59642, 29821, 89464, 44732, 22366, 11183, 33550, 16775, 50326, 25163, 75490, 37745, 113236, 56618, 28309, 84928, 42464, 21232, 10616, 5308, 2654, 1327, 3982, 1991, 5974, 2987, 8962, 4481, 13444, 6722, 3361, 10084, 5042, 2521, 7564, 3782, 1891, 5674, 2837, 8512, 4256, 2128, 1064, 532, 266, 133, 400, 200, 100, 50, 25, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
Using your exact recursive idea, but using a dict (which is sparse) instead of a list (runs with no probs):
lengths = {1: 1}
def collatz(i):
if i in lengths: return lengths [i]
j = (i * 3 + 1) if i % 2 else i // 2
c = collatz (j) + 1
lengths [i] = c
return c
longest = 1
for i in range(1, 1000000):
c = collatz(i)
if c > longest:
longest = c
print(i, c)
Upvotes: 4