cmjdxy
cmjdxy

Reputation: 396

Why does my Python loop intends to consume all the memory?

I want to generate and keep a set of tuples in a certain time. Yet I found the program seemed to consume all the memory if given enough time.

I have tried two methods. One is delete the newly generated variables, the other is gc.collect(). But neither of them worked. If I just generate and not keep the tuples, the program would consume limited memory.

generate and keep: gk.py

import gc
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof


@profile
def loop(limit):
    t = time.time()
    i = 0
    A = set()
    while True:
        i += 1
        duration = time.time() - t
        a = tuple(sorted(sample(range(200), 100)))
        A.add(a)
        if not i % int(1e4):
            print('step {:.2e}...'.format(i))
        if duration > limit:
            print('done')
            break
        # method 1: delete the variables
#        del duration, a
        # method 2: use gc
#        gc.collect()
    memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
             getsizeof(a) + getsizeof(limit) + getsizeof(A)
    print('memory consumed: {:.2e}MB'.format(memory/2**20))
    pass


def main():
    limit = 300
    loop(limit)
    pass


if __name__ == '__main__':
    print('running...')
    main()

generate and not keep: gnk.py

import time
from memory_profiler import profile
from random import sample
from sys import getsizeof


@profile
def loop(limit):
    t = time.time()
    i = 0
    while True:
        i += 1
        duration = time.time() - t
        a = tuple(sorted(sample(range(200), 100)))
        if not i % int(1e4):
            print('step {:.2e}...'.format(i))
        if duration > limit:
            print('done')
            break
    memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
             getsizeof(a) + getsizeof(limit)
    print('memory consumed: {:.2e}MB'.format(memory/2**20))
    pass


def main():
    limit = 300
    loop(limit)
    pass


if __name__ == '__main__':
    print('running...')
    main()

use "mprof" (needs module memory_profiler) in cmd/shell to check memory usage

mprof run my_file.py
mprof plot

result of gk.py

memory consumed: 4.00e+00MB
Filename: gk.py

Line #    Mem usage    Increment   Line Contents
================================================
    12     32.9 MiB     32.9 MiB   @profile
    13                             def loop(limit):
    14     32.9 MiB      0.0 MiB       t = time.time()
    15     32.9 MiB      0.0 MiB       i = 0
    16     32.9 MiB      0.0 MiB       A = set()
    17     32.9 MiB      0.0 MiB       while True:
    18    115.8 MiB      0.0 MiB           i += 1
    19    115.8 MiB      0.0 MiB           duration = time.time() - t
    20    115.8 MiB      0.3 MiB           a = tuple(sorted(sample(range(200), 100)))
    21    115.8 MiB      2.0 MiB           A.add(a)
    22    115.8 MiB      0.0 MiB           if not i % int(1e4):
    23    111.8 MiB      0.0 MiB               print('step {:.2e}...'.format(i))
    24    115.8 MiB      0.0 MiB           if duration > limit:
    25    115.8 MiB      0.0 MiB               print('done')
    26    115.8 MiB      0.0 MiB               break
    27                                     # method 1: delete the variables
    28                             #        del duration, a
    29                                     # method 2: use gc
    30                             #        gc.collect()
    31                                 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
    32    115.8 MiB      0.0 MiB                getsizeof(a) + getsizeof(limit) + getsizeof(A)
    33    115.8 MiB      0.0 MiB       print('memory consumed: {:.2e}MB'.format(memory/2**20))
    34    115.8 MiB      0.0 MiB       pass

result of gnk.py

memory consumed: 9.08e-04MB
Filename: gnk.py

Line #    Mem usage    Increment   Line Contents
================================================
    11     33.0 MiB     33.0 MiB   @profile
    12                             def loop(limit):
    13     33.0 MiB      0.0 MiB       t = time.time()
    14     33.0 MiB      0.0 MiB       i = 0
    15     33.0 MiB      0.0 MiB       while True:
    16     33.0 MiB      0.0 MiB           i += 1
    17     33.0 MiB      0.0 MiB           duration = time.time() - t
    18     33.0 MiB      0.1 MiB           a = tuple(sorted(sample(range(200), 100)))
    19     33.0 MiB      0.0 MiB           if not i % int(1e4):
    20     33.0 MiB      0.0 MiB               print('step {:.2e}...'.format(i))
    21     33.0 MiB      0.0 MiB           if duration > limit:
    22     33.0 MiB      0.0 MiB               print('done')
    23     33.0 MiB      0.0 MiB               break
    24                                 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
    25     33.0 MiB      0.0 MiB                getsizeof(a) + getsizeof(limit)
    26     33.0 MiB      0.0 MiB       print('memory consumed: {:.2e}MB'.format(memory/2**20))
    27     33.0 MiB      0.0 MiB       pass

I have two problems:

  1. both the programs consumed more memory than the variables occupied. "gk.py" consumed 115.8MB, its variables occupied 4.00MB. "gnk.py" consumed 33.0MB, its variables occupied 9.08e-04MB. Why the programs consumed more memory than the corresponding variables occupied?

  2. memory that "gk.py" consumed increases linearly with time. memory that "gnk.py" consumed remains constantly with time. Why does this happen?

Any help would be appreciated.

Upvotes: 1

Views: 432

Answers (1)

araraonline
araraonline

Reputation: 1562

Given that the size of the set is being constantly increased, there will be a time when it will eventually consume all memory.

An estimative (from my computer):

10 seconds of code running ~ 5e4 tuples saved to the set
300 seconds of code running ~ 1.5e6 tuples saved to the set

1 tuple = 100 integers ~ 400bytes

total:

1.5e6 * 400bytes = 6e8bytes = 600MB filled in 300s

Upvotes: 2

Related Questions