logicOnAbstractions
logicOnAbstractions

Reputation: 2580

Memory management for python scripts

So I'm trying to solve some problems from the Euler project in python. I'm currently working on Problem 92, square digit chains. Basically the idea is that if you take any integer, and square its component digits recursively (e.g. 42 = 42 + 22 = 20, then 22 + 02 = 4, etc.), you always end up either at 1 or 89.

I am trying to write a program that can compute how many numbers, in a range 1 to 10K, will end up in 89 and how many will end up in 1. I am not trying to store which integers end up where, only how many. The goal is to be able to do that for the largest K possible. (This is a challenge from Hackerrank for those curious).

In other to do for large number within my lifetime, I need to use caching. But then that's a balancing act between caching (which eventually takes up lots of RAM) and computing time.

My problem is that I eventually run out of memory. So I have tried to cap the length of the cache that I am using. However, I still run out of memory. I cannot seem to be able to find what is causing me to run out of memory.

I am running it on pycharm on ubuntu 14.04 LTS.

My question:

Is there a way to check what is taking up my RAM? Is there some tool (or script) that can allow me to basically monitor memory use by variables within my program? Or an wrong in assuming that if I run out of RAM, it is necessarily because some variable in my program is too large? I have to admit I am not all that clear on the fine details of memory use within a program....

EDIT: I run out of mem when K = 8, so for integers up to 108, which is not so large. Also, I did testing before 108 (so 107, which terminates but takes some time and uses more memory than smaller computation). And it doesn't seem that capping my cache size variables makes a differences.....

Upvotes: 0

Views: 152

Answers (3)

David Hammen
David Hammen

Reputation: 33106

This is an extended comment on the answers by Mathias Rav and John Coleman. I was going to make this a community wiki answer. John Coleman said not to do so, so I'm not.


I'll start with John Coleman's answer.

cache = {}

def helper(n):
    if n == 1 or n == 89:
        return n
    elif n in cache:
        return cache[n]
    else:
        ss = sum(int(d)**2 for d in str(n))
        v = helper(ss)
        cache[n] = v
        return v

def f(n):
    ss = sum(int(d)**2 for d in str(n))
    return helper(ss)

A small thing that will speed things up a bit is to avoid that first if in helper(n) by initializing cache to {1:some_value, 89:some_other_value}. The obvious initialization is {1:1, 89:89}. A less obvious, but ultimately faster initialization is {1:False, 89:True}. This enables changing if f(i) == 89: total += 1 to if f(i): total += 1.

Another small thing that might help is to get rid of the recursion. That's not the case here. To get rid of the recursion, we'd have to do something along the lines of

def helper(n):
    l = []
    while n not in cache :
        l.append(n)
        n = sum(int(d)**2 for d in str(n))
    v = cache[n]
    for k in l : 
        cache[k] = v
    return v

The problem is that almost all of the numbers encountered by f(n) will already be in the cache thanks to how helper is called from f(n). Getting rid of the recursion needlessly creates an empty list that needs to be garbage collected.

The big issue with John Coleman's answer is the calculation of the sum of the square of the digits via sum(int(d)**2 for d in str(n)). While very pythonic, this is extremely expensive. I'll start by changing the variable ss in helper and in f into a function:

def ss(n):
    return sum(int(d)**2 for d in str(n))

This alone does nothing for performance. In fact, it hurts performance. Function calls are expensive in python. By making this a function, we can do some non-pythonic things by replacing the string operations with integer arithmetic:

def ss(n):
    s = 0
    while n != 0:
        d = n % 10
        n = n // 10
        s += d**2
    return s

The speedup here is quite significant; I get a 30% reduction in computation time. That's not great. There's another problem, the use of the exponentiation operator. In almost any language but Fortran and Matlab, using d*d is much faster than is d**2. That's certainly the case in python. That simple change almost halves the execution time from that already significant 30% reduction.

Putting this all together yields

cache = {1:False, 89:True}

def ss (n):
    s = 0
    while n != 0:
        d = n % 10
        n = n // 10
        s += d*d
    return s

def helper(n):
    if n in cache:
        return cache[n]
    else:
        v = helper(ss(n))
        cache[n] = v
        return v

def f(n):
    return helper(ss(n))

def freq89(n):
    total = 0
    for i in range(1,n+1):
        if f(i): total += 1
    return total/n

print (freq89(int(1e7)))


I have yet to take advantage of Mathias Rav's answer. In this case, it will make sense to get rid of the recursion. It will also help to embed the loop over the initial range inside of the function that initializes the cache (function calls are expensive in python).

N = int(1e7)
cache = {1:False, 89:True}

def ss(n):
    s = 0
    while n != 0:
        d = n % 10
        n //= 10
        s += d*d
    return s

def initialize_cache(maxsum):
    for n in range(1,maxsum+1):
        l = []
        while n not in cache:
            l.append(n)
            n = ss(n)
        v = cache[n]
        for k in l:
            cache[k] = v

def freq89(n):
    total = 0
    for i in range(1,n):
        if cache[ss(i)]:
            total += 1
    return total/n

maxsum = 81*len(str(N-1))
initialize_cache(maxsum)
print (freq89(N))


The above takes about 16.5 seconds (on my computer) to calculate the ratio for numbers between 1 (inclusive) and 10000000 (exclusive) on my computer. This is almost three times faster than the initial version (44.7 seconds). It takes a bit over three minutes for the above to calculate calculate the ratio for numbers between 1 (inclusive) and 1e8 (exclusive).


It turns out I'm not done. There's no need to calculate the sum of the squares of the digits of (for example) 12345679 digit by digit when the program just did that for 12345678. A shortcut that reduces the calculation time for nine out of ten use cases pays off. The function ss(n) becomes a bit more complex:

prevn = 0 
prevd = 0 
prevs = 0 

def ss(n):
    global prevn, prevd, prevs
    d = n % 10
    if (n == prevn+1) and (d == prevd+1):
        s = prevs + 2*prevd + 1 
        prevs = s 
        prevn = n 
        prevd = d 
        return s
    s = 0 
    prevn = n 
    prevd = d 
    while n != 0:
        d = n % 10
        n //= 10
        s += d*d 
    prevs = s 
    return s

With this, calculating the ratio for numbers up to (but not including) 1e7 takes 6.6 seconds, 68 seconds for numbers up to but not including 1e8.

Upvotes: 1

John Coleman
John Coleman

Reputation: 51988

This is a variation of Mathias Rav's excellent idea but keeps your idea of using a recursive function with memozation. The idea is to use a helper function to do the heavy lifting and have the main function just do the first step of the iteration. The very first step reduces the problem size to one for which caching is useful. The cache remains small. I was able to do all numbers up to 10**8 in about 10 minutes (the overhead due to the recursion makes this solution less efficient than Mathias' solution):

cache = {}

def helper(n):
    if n == 1 or n == 89:
        return n
    elif n in cache:
        return cache[n]
    else:
        ss = sum(int(d)**2 for d in str(n))
        v = helper(ss)
        cache[n] = v
        return v

def f(n):
    ss = sum(int(d)**2 for d in str(n))
    return helper(ss)

def freq89(n):
    total = 0
    for i in range(1,n+1):
        if f(i) == 89: total += 1
    return total/n

Upvotes: 2

Mathias Rav
Mathias Rav

Reputation: 2973

I would suggest testing various cache sizes to see if it is actually beneficial to have as large a cache as possible.

If you take any 10-digit number and compute the sum of squares of its digits, the sum will be at most 10*9*9 = 810. Thus, if you cache the result for numbers 1 to 810, then you should be able to process all numbers with between 4 and 10 digits without recursion.

In this way, I have processed the first 10^8 numbers in around 6 minutes with memory usage staying constant at roughly 10 MB.

Upvotes: 3

Related Questions