Reputation: 4011
So, I'm stuck in quite a pickle. I have a library that, when imported, occupies roughly 2.6GB.
I assumed this was from it's rather large dependencies, but when adding them all up I came at around 0.8GB of memory, no anywhere near enough to explain the size.
I tried 3x profilers:
memory-profiler
-- doesn't work, just shows the size of the import as a wholeguppy
(specifically the .heap()
functionality from it -- gave up after not being able to get anything meaningful from it's outputpympler
-- doesn't seem to count import (I think?)... basically after I import the 2.6GB library it counts a total of 0.09GB used, which is odd.What I really want is a profiler that would lead me in the right direction, e.g. tell me how much memory each file consumes after being imported. I'm considering doing this manually (breakpoints that will trigger on import + analyzing memory usage). But I'm curios if there's any other way to do it?
Upvotes: 3
Views: 585
Reputation: 29
I develop guppy3 so I will give an answer for guppy. I am much less familiar with other profilers. Here's some of the methods I would utilize (you can combine them too; say, "dominated set of producer profile" is also valid):
If you can measure the heap a point in time before everything was imported, you can set a reference point and measure a relative heap at a later point in time:
>>> import guppy
>>> hp = guppy.hpy()
>>> hp.setref()
>>> import wave
>>> hp.heap()
Partition of a set of 777 objects. Total size = 86,330 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 213 27 22,258 26 22,258 26 str
1 209 27 16,736 19 38,994 45 tuple
2 67 9 9,648 11 48,642 56 function
3 62 8 8,928 10 57,570 67 types.CodeType
4 130 17 8,172 9 65,742 76 bytes
5 6 1 6,216 7 71,958 83 type
6 6 1 4,192 5 76,150 88 dict of type
7 3 0 2,760 3 78,910 91 dict of module
8 26 3 2,080 2 80,990 94 types.BuiltinMethodType
9 2 0 832 1 81,822 95 types.FrameType
<16 more rows. Type e.g. '_.more' to view.>
If you have Python 3.8+, you can use guppy's producer profile to find the set of objects whose innermost frame of allocation resides in the file:
>>> import tracemalloc
>>> tracemalloc.start()
>>> import wave
>>> import guppy
>>> hp = guppy.hpy()
>>> hp.heap() & hp.Prod(wave.__file__)
Partition of a set of 82 objects. Total size = 13,896 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 50 61 6,800 49 6,800 49 function
1 3 4 3,192 23 9,992 72 type
2 3 4 2,584 19 12,576 91 dict of type
3 9 11 360 3 12,936 93 types.CellType
4 6 7 328 2 13,264 95 tuple
5 5 6 320 2 13,584 98 types.GetSetDescriptorType
6 3 4 216 2 13,800 99 builtins.weakref
7 3 4 96 1 13,896 100 int
This will not include anything that is indirectly allocated:
>>> import tracemalloc
>>> tracemalloc.start()
>>> import guppy
>>> hp = guppy.hpy()
>>> hp.setref()
>>> import wave
>>> heap = hp.heap()
>>> heap.byprod
/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py:1198: UserWarning: Unable to determine sizeof(PyGC_Head) from _testcapi.SIZEOF_PYGC_HEAD, assuming 16
return self.mod.hv.cli_prod(memo)
Partition of a set of 688 objects. Total size = 77,934 bytes.
Index Count % Size % Cumulative % Producer (line of allocation)
0 471 68 45,835 59 45,835 59 <frozen importlib._bootstrap_external>:647
1 36 5 4,760 6 50,595 65 <frozen importlib._bootstrap>:228
2 8 1 2,576 3 53,171 68 /usr/lib/python3.9/wave.py:275
3 8 1 2,576 3 55,747 72 /usr/lib/python3.9/wave.py:94
4 8 1 2,040 3 57,787 74 /usr/lib/python3.9/chunk.py:51
5 7 1 1,800 2 59,587 76 /usr/lib/python3.9/collections/__init__.py:497
6 4 1 1,680 2 61,267 79 <frozen importlib._bootstrap>:36
7 7 1 1,584 2 62,851 81 /usr/lib/python3.9/wave.py:84
8 6 1 770 1 63,621 82 <unknown>:0
9 5 1 468 1 64,089 82 <frozen importlib._bootstrap_external>:123
<89 more rows. Type e.g. '_.more' to view.>
>>> heap - hp.Prod(wave.__file__)
Partition of a set of 605 objects. Total size = 63,918 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 196 32 20,946 33 20,946 33 str
1 61 10 10,736 17 31,682 50 types.CodeType
2 134 22 9,584 15 41,266 65 tuple
3 124 20 7,960 12 49,226 77 bytes
4 3 0 3,024 5 52,250 82 type
5 3 0 2,712 4 54,962 86 dict of module
6 16 3 2,176 3 57,138 89 function
7 26 4 1,872 3 59,010 92 types.BuiltinMethodType
8 3 0 1,512 2 60,522 95 dict of type
9 2 0 800 1 61,322 96 types.FrameType
<14 more rows. Type e.g. '_.more' to view.>
>>> _.byprod
Partition of a set of 605 objects. Total size = 63,918 bytes.
Index Count % Size % Cumulative % Producer (line of allocation)
0 471 78 45,835 72 45,835 72 <frozen importlib._bootstrap_external>:647
1 36 6 4,760 7 50,595 79 <frozen importlib._bootstrap>:228
2 8 1 2,040 3 52,635 82 /usr/lib/python3.9/chunk.py:51
3 7 1 1,800 3 54,435 85 /usr/lib/python3.9/collections/__init__.py:497
4 4 1 1,680 3 56,115 88 <frozen importlib._bootstrap>:36
5 6 1 770 1 56,885 89 <unknown>:0
6 5 1 468 1 57,353 90 <frozen importlib._bootstrap_external>:123
7 3 0 432 1 57,785 90 <frozen importlib._bootstrap>:353
8 5 1 423 1 58,208 91 /usr/lib/python3.9/collections/__init__.py:431
9 6 1 421 1 58,629 92 /usr/lib/python3.9/collections/__init__.py:477
<34 more rows. Type e.g. '_.more' to view.>
Another thing guppy could do is find everything a set of objects (eg. a set containing one module object) refers to, but not referred to by anything else; in other words, everything whose reference chain must go through the given set. This is through the dominos attribute. For example, this is everything the wave
library have that isn't referred to by anything else:
>>> import wave
>>> import guppy
>>> guppy.hpy().iso(wave).dominos
Partition of a set of 601 objects. Total size = 64,664 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 173 29 17,641 27 17,641 27 str
1 173 29 13,688 21 31,329 48 tuple
2 57 9 8,208 13 39,537 61 function
3 52 9 7,488 12 47,025 73 types.CodeType
4 111 18 6,631 10 53,656 83 bytes
5 4 1 4,088 6 57,744 89 type
6 4 1 3,288 5 61,032 94 dict of type
7 1 0 1,192 2 62,224 96 dict of module
8 6 1 528 1 62,752 97 property
9 2 0 496 1 63,248 98 dict (no owner)
<10 more rows. Type e.g. '_.more' to view.>
However, if anything in the module is referred to by another module, then you may have to find such references to make it useful. For example, copyreg.dispatch_table
refers to re.Pattern
:
>>> import re
>>> import guppy
>>> guppy.hpy().iso(re).dominos
Partition of a set of 1 object. Total size = 88 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 100 88 100 88 100 module
>>> import copyreg
>>> copyreg.dispatch_table
{<class 'complex'>: <function pickle_complex at 0x7ff186208d40>, <class 're.Pattern'>: <function _pickle at 0x7ff1862b7d40>}
>>> guppy.hpy().iso(re, copyreg.dispatch_table).dominos
Partition of a set of 336 objects. Total size = 48,382 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 78 23 13,327 28 13,327 28 str
1 87 26 6,808 14 20,135 42 tuple
2 20 6 2,880 6 23,015 48 types.CodeType
3 19 6 2,736 6 25,751 53 function
4 6 2 2,688 6 28,439 59 dict (no owner)
5 41 12 2,419 5 30,858 64 bytes
6 1 0 2,288 5 33,146 69 dict of module
7 3 1 2,224 5 35,370 73 dict of type
8 3 1 1,864 4 37,234 77 type
9 20 6 1,600 3 38,834 80 types.MethodDescriptorType
<19 more rows. Type e.g. '_.more' to view.>
Alternatively you could put everything in the library as your set:
>>> guppy.hpy().iso(re, re.__dict__, *re.__dict__.keys(), *re.__dict__.values()).dominos
Partition of a set of 1,621 objects. Total size = 202,064 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 542 33 44,132 22 44,132 22 str
1 422 26 38,312 19 82,444 41 tuple
2 192 12 26,552 13 108,996 54 bytes
3 6 0 19,904 10 128,900 64 dict of module
4 94 6 13,656 7 142,556 71 types.CodeType
5 84 5 12,096 6 154,652 77 function
6 9 1 8,248 4 162,900 81 type
7 7 0 7,512 4 170,412 84 frozenset
8 13 1 5,752 3 176,164 87 dict (no owner)
9 9 1 5,064 3 181,228 90 dict of type
<29 more rows. Type e.g. '_.more' to view.>
However, this will include every module your module refers to:
>>> hp.iso(re, re.__dict__, *re.__dict__.keys(), *re.__dict__.values()).dominos - hp.iso(re, copyreg.dispatch_table).dominos
Partition of a set of 1,275 objects. Total size = 153,153 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 335 26 31,504 21 31,504 21 tuple
1 454 36 30,276 20 61,780 40 str
2 151 12 24,133 16 85,913 56 bytes
3 5 0 17,616 12 103,529 68 dict of module
4 74 6 10,776 7 114,305 75 types.CodeType
5 65 5 9,360 6 123,665 81 function
6 7 1 7,512 5 131,177 86 frozenset
7 6 0 6,384 4 137,561 90 type
8 7 1 3,064 2 140,625 92 dict (no owner)
9 6 0 2,840 2 143,465 94 dict of type
<17 more rows. Type e.g. '_.more' to view.>
>>> _.byid
Set of 1,275 <mixed> objects. Total size = 153,153 bytes.
Index Size % Cumulative % Brief
0 4,712 3.1 4,712 3.1 dict of module: builtins
1 4,712 3.1 9,424 6.2 dict of module: sre_compile
2 4,712 3.1 14,136 9.2 dict of module: sre_parse
3 3,271 2.1 17,407 11.4 bytes: b't\x00|\x01\...x00|\x05S\x00'
4 2,288 1.5 19,695 12.9 dict of module: _locale
5 2,280 1.5 21,975 14.3 frozenset: 0x7ff1863cbb90
6 2,280 1.5 24,255 15.8 frozenset: 0x7ff1863cbc80
7 1,843 1.2 26,098 17.0 bytes: b'|\x00j\x00}...x00d\x00S\x00'
8 1,545 1.0 27,643 18.0 bytes: b'd\x00d\x01\...x04d\x00S\x00'
9 1,192 0.8 28,835 18.8 dict (no owner): 0x7ff18628e780*33
<1,265 more rows. Type e.g. '_.more' to view.>
A saner approach (still an overestimation) is to eliminate the modules from this set (because they are referred to by sys.modules):
>>> (hp.iso(re) | (hp.iso(re.__dict__, *re.__dict__.keys(), *re.__dict__.values()) - types.ModuleType)).dominos
Partition of a set of 424 objects. Total size = 61,004 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 131 31 16,735 27 16,735 27 str
1 96 23 7,592 12 24,327 40 tuple
2 2 0 7,000 11 31,327 51 dict of module
3 21 5 3,024 5 34,351 56 types.CodeType
4 4 1 2,928 5 37,279 61 type
5 20 5 2,880 5 40,159 66 function
6 44 10 2,709 4 42,868 70 bytes
7 4 1 2,600 4 45,468 75 dict of type
8 5 1 2,440 4 47,908 79 dict (no owner)
9 20 5 1,600 3 49,508 81 types.MethodDescriptorType
<25 more rows. Type e.g. '_.more' to view.>
>>> _ - hp.iso(re, copyreg.dispatch_table).dominos
Partition of a set of 139 objects. Total size = 19,025 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 1 4,712 25 4,712 25 dict of module
1 56 40 3,687 19 8,399 44 str
2 3 2 2,224 12 10,623 56 dict of type
3 3 2 1,864 10 12,487 66 type
4 20 14 1,600 8 14,087 74 types.MethodDescriptorType
5 12 9 976 5 15,063 79 tuple
6 10 7 880 5 15,943 84 types.WrapperDescriptorType
7 7 5 560 3 16,503 87 types.MemberDescriptorType
8 5 4 400 2 16,903 89 types.GetSetDescriptorType
9 3 2 360 2 17,263 91 dict of _sitebuiltins._Printer
<11 more rows. Type e.g. '_.more' to view.>
>>> _.byid
Set of 139 <mixed> objects. Total size = 19,025 bytes.
Index Size % Cumulative % Brief
0 4,712 24.8 4,712 24.8 dict of module: builtins
1 1,192 6.3 5,904 31.0 dict of type: ..Pattern
2 1,064 5.6 6,968 36.6 type: re.error
3 656 3.4 7,624 40.1 dict of type: ..Match
4 410 2.2 8,034 42.2 str: 'Exception ra...e None)\n '
5 400 2.1 8,434 44.3 type: re.Match
6 400 2.1 8,834 46.4 type: re.Pattern
7 376 2.0 9,210 48.4 dict of type: ..error
8 195 1.0 9,405 49.4 bytes: b'|\x01|\x00_...x00d\x00S\x00'
9 160 0.8 9,565 50.3 dict of _frozen_importlib.ModuleSpec: 0x7ff1864e0490
<129 more rows. Type e.g. '_.more' to view.>
Upvotes: 1