Pie86
Pie86

Reputation: 505

Is this a memory leak? (python + numpy)

The following code fills all my memory:

from sys import getsizeof
import numpy

# from http://stackoverflow.com/a/2117379/272471
def getSize(array):
    return getsizeof(array) + len(array) * getsizeof(array[0])


class test():
    def __init__(self):
        pass
    def t(self):
        temp = numpy.zeros([200,100,100])
        A = numpy.zeros([200], dtype = numpy.float64)
        for i in range(200):
            A[i] = numpy.sum( temp[i].diagonal() ) 
        return A

a = test()
memory_usage("before")
c = [a.t() for i in range(100)]
del a
memory_usage("After")
print("Size of c:", float(getSize(c))/1000.0)

The output is:

('>', 'before', 'memory:', 20588, 'KiB  ')
('>', 'After', 'memory:', 1583456, 'KiB  ')
('Size of c:', 8.92)

Why am I using ~1.5 GB of memory if c is ~ 9 KiB? Is this a memory leak? (Thanks)

The memory_usage function was posted on SO and is reported here for clarity:

def memory_usage(text = ''):
    """Memory usage of the current process in kilobytes."""
    status = None
    result = {'peak': 0, 'rss': 0}
    try:
        # This will only work on systems with a /proc file system
        # (like Linux).
        status = open('/proc/self/status')
        for line in status:
            parts = line.split()
            key = parts[0][2:-1].lower()
            if key in result:
                result[key] = int(parts[1])
    finally:
        if status is not None:
            status.close()
    print('>', text, 'memory:', result['rss'], 'KiB  ')
    return result['rss']

Upvotes: 2

Views: 4481

Answers (3)

Pie86
Pie86

Reputation: 505

The implementation of diagonal() failed to decrement a reference counter. This issue had been previously fixed, but the change didn't make it into 1.7.0.

Upgrading to 1.7.1 solves the problem! The release notes contain various useful identifiers, notably issue 2969.

The solution was provided by Sebastian Berg and Charles Harris on the NumPy mailing list.

Upvotes: 6

jcr
jcr

Reputation: 1015

I don't think sys.getsizeof returns what you expect

your numpy vector A is 64 bit (8 bytes) - so it takes up (at least)

8 * 200 * 100 * 100 * 100 / (2.0**30) = 1.5625 GB

so at minimum you should use 1.5 GB on the 100 arrays, the last few hundred mg are all the integers used for indexing the large numpy data and the 100 objects

It seems that sys.getsizeof always returns 80 no matter how large a numpy array is:

sys.getsizeof(np.zeros([200,1000,100])) # return 80
sys.getsizeof(np.zeros([20,100,10])) # return 80

In your code you delete a which is a tiny factory object who's t method return huge numpy arrays, you store these huge arrays in a list called c. try to delete c, then you should regain most of your RAM

Upvotes: 0

glglgl
glglgl

Reputation: 91149

Python allocs memory from the OS if it needs some.

If it doesn't need it any longer, it may or may not return it again.

But if it doesn't return it, the memory will be reused on subsequent allocations. You should check that; but supposedly the memory consumption won't increase even more.

About your estimations of memory consumption: As azorius already wrote, your temp array consumes 16 MB, while your A array consumes about 200 * 8 = 1600 bytes (+ 40 for internal reasons). If you take 100 of them, you are at 164000 bytes (plus some for the list).

Besides that, I have no explanation for the memory consumption you have.

Upvotes: 1

Related Questions