Shabbyrobe
Shabbyrobe

Reputation: 12628

Python equivalent of PHP's memory_get_usage()?

I've already found the following question, but I was wondering if there was a quicker and dirtier way of grabbing an estimate of how much memory the python interpreter is currently using for my script that doesn't rely on external libraries.

I'm coming from PHP and used to use memory_get_usage() and memory_get_peak_usage() a lot for this purpose and I was hoping to find an equivalent.

Upvotes: 24

Views: 9852

Answers (6)

Don Kirkby
Don Kirkby

Reputation: 56620

The same kind of data that's in /proc/self/status is also in /proc/self/statm. However, it's easier to parse, because it's just a space delimited list of several statistics. I haven't been able to tell if both files are always present.

/proc/[pid]/statm

Provides information about memory usage, measured in pages. The columns are:

  • size (1) total program size (same as VmSize in /proc/[pid]/status)
  • resident (2) resident set size (same as VmRSS in /proc/[pid]/status)
  • shared (3) number of resident shared pages (i.e., backed by a file) (same as RssFile+RssShmem in /proc/[pid]/status)
  • text (4) text (code)
  • lib (5) library (unused since Linux 2.6; always 0)
  • data (6) data + stack
  • dt (7) dirty pages (unused since Linux 2.6; always 0)

Here's a simple example:

from pathlib import Path
from resource import getpagesize

PAGESIZE = getpagesize()
PATH = Path('/proc/self/statm')


def get_resident_set_size() -> int:
    """Return the current resident set size in bytes."""
    # statm columns are: size resident shared text lib data dt
    statm = PATH.read_text()
    fields = statm.split()
    return int(fields[1]) * PAGESIZE


data = []
start_memory = get_resident_set_size()
for _ in range(10):
    data.append('X' * 100000)
    print(get_resident_set_size() - start_memory)

That produces a list that looks something like this:

0
0
368640
368640
368640
638976
638976
909312
909312
909312

You can see that it jumps by about 300,000 bytes after roughly 3 allocations of 100,000 bytes.

Upvotes: 1

Nathan Craike
Nathan Craike

Reputation: 5347

You could also use the getrusage() function from the standard library module resource. The resulting object has the attribute ru_maxrss, which gives total peak memory usage for the calling process:

>>> import resource
>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
2656

The Python docs aren't clear on what the units are exactly, but the Mac OS X man page for getrusage(2) describes the units as kilobytes.

The Linux man page isn't clear, but it seems to be equivalent to the /proc/self/status information (i.e. kilobytes) described in the accepted answer. For the same process as above, running on Linux, the function listed in the accepted answer gives:

>>> memory_usage()                                    
{'peak': 6392, 'rss': 2656}

This may not be quite as easy to use as the /proc/self/status solution, but it is standard library, so (provided the units are standard) it should be cross-platform, and usable on systems which lack /proc/ (eg Mac OS X and other Unixes, maybe Windows).

Also, getrusage() function can also be given resource.RUSAGE_CHILDREN to get the usage for child processes, and (on some systems) resource.RUSAGE_BOTH for total (self and child) process usage.

This will cover the memory_get_usage() case, but doesn't include peak usage. I'm unsure if any other functions from the resource module can give peak usage.

Upvotes: 19

saaj
saaj

Reputation: 25194

/proc/self/status has the following relevant keys:

  • VmPeak: Peak virtual memory size.
  • VmSize: Virtual memory size.
  • VmHWM: Peak resident set size ("high water mark").
  • VmRSS: Resident set size.

So if the concern is resident memory, the following code can me used to retrieve it:

def get_proc_status(keys = None):
    with open('/proc/self/status') as f:
        data = dict(map(str.strip, line.split(':', 1)) for line in f)

    return tuple(data[k] for k in keys) if keys else data

peak, current = get_proc_status(('VmHWM', 'VmRSS'))
print(peak, current)  # outputs: 14280 kB 13696 kB

Here's an article by memory_profiler's author that explains that getrusage's ru_maxrss isn't always a practical measure. Also note that, VmHWM may differ from ru_maxrss (what I see in some cases ru_maxrss is greater). But in the simple case they are the same:

import resource


def report():
    maxrss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    peak, current = get_proc_status(('VmHWM', 'VmRSS'))
    print(current, peak, maxrss)


report()

s = ' ' * 2 ** 28  # 256MiB
report()

s = None
report()

In addition here's a very comprehensible yet informative case study by atop authors which explains what is kernel, virtual and resident memory, and how they are interdependent.

Upvotes: 1

johndodo
johndodo

Reputation: 18271

Accepted answer rules, but it might be easier (and more portable) to use psutil. It does the same and a lot more.

UPDATE: muppy is also very convenient (and much better documented than guppy/heapy).

Upvotes: 11

Martin Geisler
Martin Geisler

Reputation: 73748

A simple solution for Linux and other systems with /proc/self/status is the following code, which I use in a project of mine:

def memory_usage():
    """Memory usage of the current process in kilobytes."""
    status = None
    result = {'peak': 0, 'rss': 0}
    try:
        # This will only work on systems with a /proc file system
        # (like Linux).
        status = open('/proc/self/status')
        for line in status:
            parts = line.split()
            key = parts[0][2:-1].lower()
            if key in result:
                result[key] = int(parts[1])
    finally:
        if status is not None:
            status.close()
    return result

It returns the current and peak resident memory size (which is probably what people mean when they talk about how much RAM an application is using). It is easy to extend it to grab other pieces of information from the /proc/self/status file.

For the curious: the full output of cat /proc/self/status looks like this:

% cat /proc/self/status
Name:   cat
State:  R (running)
Tgid:   4145
Pid:    4145
PPid:   4103
TracerPid:      0
Uid:    1000    1000    1000    1000
Gid:    1000    1000    1000    1000
FDSize: 32
Groups: 20 24 25 29 40 44 46 100 1000 
VmPeak:     3580 kB
VmSize:     3580 kB
VmLck:         0 kB
VmHWM:       472 kB
VmRSS:       472 kB
VmData:      160 kB
VmStk:        84 kB
VmExe:        44 kB
VmLib:      1496 kB
VmPTE:        16 kB
Threads:        1
SigQ:   0/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed:   03
Cpus_allowed_list:      0-1
Mems_allowed:   1
Mems_allowed_list:      0
voluntary_ctxt_switches:        0
nonvoluntary_ctxt_switches:     0

Upvotes: 31

dfa
dfa

Reputation: 116324

try heapy

Upvotes: 2

Related Questions