Reputation: 396
I want to generate and keep a set of tuples in a certain time. Yet I found the program seemed to consume all the memory if given enough time.
I have tried two methods. One is delete the newly generated variables, the other is gc.collect(). But neither of them worked. If I just generate and not keep the tuples, the program would consume limited memory.
generate and keep: gk.py
import gc
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof
@profile
def loop(limit):
t = time.time()
i = 0
A = set()
while True:
i += 1
duration = time.time() - t
a = tuple(sorted(sample(range(200), 100)))
A.add(a)
if not i % int(1e4):
print('step {:.2e}...'.format(i))
if duration > limit:
print('done')
break
# method 1: delete the variables
# del duration, a
# method 2: use gc
# gc.collect()
memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
getsizeof(a) + getsizeof(limit) + getsizeof(A)
print('memory consumed: {:.2e}MB'.format(memory/2**20))
pass
def main():
limit = 300
loop(limit)
pass
if __name__ == '__main__':
print('running...')
main()
generate and not keep: gnk.py
import time
from memory_profiler import profile
from random import sample
from sys import getsizeof
@profile
def loop(limit):
t = time.time()
i = 0
while True:
i += 1
duration = time.time() - t
a = tuple(sorted(sample(range(200), 100)))
if not i % int(1e4):
print('step {:.2e}...'.format(i))
if duration > limit:
print('done')
break
memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
getsizeof(a) + getsizeof(limit)
print('memory consumed: {:.2e}MB'.format(memory/2**20))
pass
def main():
limit = 300
loop(limit)
pass
if __name__ == '__main__':
print('running...')
main()
use "mprof" (needs module memory_profiler) in cmd/shell to check memory usage
mprof run my_file.py
mprof plot
result of gk.py
memory consumed: 4.00e+00MB
Filename: gk.py
Line # Mem usage Increment Line Contents
================================================
12 32.9 MiB 32.9 MiB @profile
13 def loop(limit):
14 32.9 MiB 0.0 MiB t = time.time()
15 32.9 MiB 0.0 MiB i = 0
16 32.9 MiB 0.0 MiB A = set()
17 32.9 MiB 0.0 MiB while True:
18 115.8 MiB 0.0 MiB i += 1
19 115.8 MiB 0.0 MiB duration = time.time() - t
20 115.8 MiB 0.3 MiB a = tuple(sorted(sample(range(200), 100)))
21 115.8 MiB 2.0 MiB A.add(a)
22 115.8 MiB 0.0 MiB if not i % int(1e4):
23 111.8 MiB 0.0 MiB print('step {:.2e}...'.format(i))
24 115.8 MiB 0.0 MiB if duration > limit:
25 115.8 MiB 0.0 MiB print('done')
26 115.8 MiB 0.0 MiB break
27 # method 1: delete the variables
28 # del duration, a
29 # method 2: use gc
30 # gc.collect()
31 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
32 115.8 MiB 0.0 MiB getsizeof(a) + getsizeof(limit) + getsizeof(A)
33 115.8 MiB 0.0 MiB print('memory consumed: {:.2e}MB'.format(memory/2**20))
34 115.8 MiB 0.0 MiB pass
result of gnk.py
memory consumed: 9.08e-04MB
Filename: gnk.py
Line # Mem usage Increment Line Contents
================================================
11 33.0 MiB 33.0 MiB @profile
12 def loop(limit):
13 33.0 MiB 0.0 MiB t = time.time()
14 33.0 MiB 0.0 MiB i = 0
15 33.0 MiB 0.0 MiB while True:
16 33.0 MiB 0.0 MiB i += 1
17 33.0 MiB 0.0 MiB duration = time.time() - t
18 33.0 MiB 0.1 MiB a = tuple(sorted(sample(range(200), 100)))
19 33.0 MiB 0.0 MiB if not i % int(1e4):
20 33.0 MiB 0.0 MiB print('step {:.2e}...'.format(i))
21 33.0 MiB 0.0 MiB if duration > limit:
22 33.0 MiB 0.0 MiB print('done')
23 33.0 MiB 0.0 MiB break
24 memory = getsizeof(t) + getsizeof(i) + getsizeof(duration) + \
25 33.0 MiB 0.0 MiB getsizeof(a) + getsizeof(limit)
26 33.0 MiB 0.0 MiB print('memory consumed: {:.2e}MB'.format(memory/2**20))
27 33.0 MiB 0.0 MiB pass
I have two problems:
both the programs consumed more memory than the variables occupied. "gk.py" consumed 115.8MB, its variables occupied 4.00MB. "gnk.py" consumed 33.0MB, its variables occupied 9.08e-04MB. Why the programs consumed more memory than the corresponding variables occupied?
memory that "gk.py" consumed increases linearly with time. memory that "gnk.py" consumed remains constantly with time. Why does this happen?
Any help would be appreciated.
Upvotes: 1
Views: 432
Reputation: 1562
Given that the size of the set is being constantly increased, there will be a time when it will eventually consume all memory.
An estimative (from my computer):
10 seconds of code running ~ 5e4 tuples saved to the set
300 seconds of code running ~ 1.5e6 tuples saved to the set
1 tuple = 100 integers ~ 400bytes
total:
1.5e6 * 400bytes = 6e8bytes = 600MB filled in 300s
Upvotes: 2