Identifying memory leak in python - Memory Profiler

Question

I'm new to Python and I'm currently working on solving problems to improve my coding skills. Currently I'm working on a question where I had to stable sort the input and output the reverse-stable sorted values. I've programmed it and executed the code in the website's online judge and for one test case (don't know the test case), I got Memory Limit Exceeded error. So after some research, I understood that there is a memory leak happening in the code and the code is not fully efficient. So I have installed the python's memory_profiler to monitoring memory consumption of a process as well as line-by-line analysis of memory consumption of my code.

Please find below the input details, code, output and the memory profiling analysis taken from the memory_profiler.

Input:

Code:

from collections import OrderedDict
@profile
def test_1():
    print "Enter the number: "
    n = raw_input()
    k = []
    v = []
    print "Enter ID and M: "
    for i in range(0,int(n)):
        a, b = raw_input().split(' ')
        k.append(a)
        v.append(b)
    d = OrderedDict(zip(k,v))
    sorted_items = sorted(d.items(), key=lambda (k,v):int(v), reverse=True)
    for i, j in sorted_items:
        print i, j

if __name__ == '__main__':
        test_1()

Output:

    Line #    Mem usage    Increment   Line Contents
================================================
     2   10.520 MiB    0.000 MiB   @profile
     3                             def test_1():
     4   10.531 MiB    0.012 MiB        print "Enter the number: "
     5   10.551 MiB    0.020 MiB        n = raw_input()
     6   10.551 MiB    0.000 MiB        k = []
     7   10.551 MiB    0.000 MiB        v = []
     8   10.551 MiB    0.000 MiB        print "Enter ID and M: "
     9   10.551 MiB    0.000 MiB        for i in range(0,int(n)):
    10   10.551 MiB    0.000 MiB                a, b = raw_input().split(' ')
    11   10.551 MiB    0.000 MiB                k.append(a)
    12   10.551 MiB    0.000 MiB                v.append(b)
    13
    14   10.551 MiB    0.000 MiB        d = OrderedDict(zip(k,v))
    15   10.555 MiB    0.004 MiB        sorted_items = sorted(d.items(), key=lambda (k,v):int(v), reverse=True)
    16   10.555 MiB    0.000 MiB        for i, j in sorted_items:
    17   10.555 MiB    0.000 MiB                print i, j

Expected Output (I'm able to get the desired output):

Is this code not efficient for higher input or higher numbers? from the analysis I could see that there is only less memory utilized but for that particular test case, I could see that memory utilization is more than 16MB. Can someone tell me where am I doing wrong. Is my approach wrong or flow is wrong? Could you please tell me why I'm not able to get the output as expected. Thanks in advance. Any help would be much appreciated.

gil · Accepted Answer

I was writing a comment but it was getting too long, so I think I might as well upgrade it to an answer.

First off, the profile decorator uses quite a bit of memory just by itself. As you can see:

from memory_profiler import profile
@profile
def foo():
    pass

and I get

Line #    Mem usage    Increment   Line Contents
================================================
    2     28.5 MiB      0.0 MiB   @profile
    3                             def foo():
    4     28.5 MiB      0.0 MiB       pass

Your number will probably vary (I'm running Python 3, in a IDE no less) but it's basically the same thing with your function. Almost all memory usage comes from the @profile line (10.520 MiB) and what's added by your function (see the Increment column) is trivial (0.36 MiB).

The upshot is that you shouldn't have any problem by the looks of it (if what you posted is already your entire code, and I suppose it is). I don't know what test case can possibly give you Memory Limit Exceeded. We really need to know what that test case is to investigate the problem.

That said, one improvement can make your code more efficient, especially for a large number of inputs. You don't need to build intermediate lists (k and v in your code). Write to the dict directly:

d = OrderedDict()
for i in range(int(n)):  # Note you don't need range(0, x); just range(x)
    a, b = raw_input().split()  # No need for an argument to split, either
    d[a] = b

Even better, you can avoid a for loop and use a more efficient generator expression:

d = OrderedDict(raw_input().split() for _ in range(int(n)))

A generator expression is an expression of the form (foo something_like_a_for_loop) (formal description here); the surrounding parentheses can be omitted if it's the sole argument to a function. It's like a list in many ways: you can iterate it using for, you can get a list out of it using list, you can use it whenever an iterator is expected. But it takes a lot less space than an equivalent list when the list is long. (But there are also differences: a gen expr can be iterated over only once, can't be indexed and doesn't have len, etc. You can read more about it here.)

There are other small improvements you can make. All incorporated into the rewrite below

def test_1():
    n = int(raw_input('Enter the number: '))
    d = OrderedDict(raw_input().split() for _ in range(n))
    sorted_items = sorted(d.items(), key=lambda k_v: int(k_v[1]), reverse=True)
    for i, j in sorted_items:
        print i, j

Identifying memory leak in python - Memory Profiler

Answers (1)

Related Questions