haojie
haojie

Reputation: 753

How to get every second's GPU usage in Python

I have a model which runs by tensorflow-gpu and my device is nvidia. And I want to list every second's GPU usage so that I can measure average/max GPU usage. I can do this mannually by open two terminals, one is to run model and another is to measure by nvidia-smi -l 1. Of course, this is not a good way. I also tried to use a Thread to do that, here it is.

import subprocess as sp
import os
from threading import Thread

class MyThread(Thread):
    def __init__(self, func, args):
        super(MyThread, self).__init__()
        self.func = func
        self.args = args

    def run(self):
        self.result = self.func(*self.args)

    def get_result(self):
        return self.result

def get_gpu_memory():
   output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
   ACCEPTABLE_AVAILABLE_MEMORY = 1024
   COMMAND = "nvidia-smi -l 1 --query-gpu=memory.used --format=csv"
   memory_use_info = output_to_list(sp.check_output(COMMAND.split()))[1:]
   memory_use_values = [int(x.split()[0]) for i, x in enumerate(memory_use_info)]
   return memory_use_values

def run():
   pass

t1 = MyThread(run, args=())
t2 = MyThread(get_gpu_memory, args=())

t1.start()
t2.start()
t1.join()
t2.join()
res1 = t2.get_result()

However, this does not return every second's usage as well. Is there a good solution?

Upvotes: 15

Views: 29020

Answers (4)

Huaijun Jiang
Huaijun Jiang

Reputation: 126

Try pip install nvidia-ml-py:

import pynvml

pynvml.nvmlInit()
deviceCount = pynvml.nvmlDeviceGetCount()
for i in range(deviceCount):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    util = pynvml.nvmlDeviceGetUtilizationRates(handle)
    mem = pynvml.nvmlDeviceGetMemoryInfo(handle)
    print(f"|Device {i}| Mem Free: {mem.free/1024**2:5.2f}MB / {mem.total/1024**2:5.2f}MB | gpu-util: {util.gpu/100.0:3.1%} | gpu-mem: {util.memory/100.0:3.1%} |")

Reference: How Can I Obtain GPU Usage Through Code?

Upvotes: 7

Sylvain
Sylvain

Reputation: 799

You might want to use GPutil from https://github.com/anderskm/gputil#usage

# For prints
GPUtil.showUtilization()

# To get values
GPUs = GPUtil.getGPUs()
load = GPUs[0].load

Upvotes: 0

olibear
olibear

Reputation: 13

Here is a more rudimentary way of getting this output, however just as effective - and I think easier to understand. I added a small 10-value cache to get a good recent average and upped the check time to every second. It outputs average of the last 10 seconds and the current each second, so operations that cause usage can be identified (what I think the original question was).

import subprocess as sp
import time

memory_total=8192 #found with this command: nvidia-smi --query-gpu=memory.total --format=csv
memory_used_command = "nvidia-smi --query-gpu=memory.used --format=csv"

isolate_memory_value = lambda x: "".join(y for y in x.decode('ascii') if y in "0123456789")

def main():
   percentage_cache = []

   while True:
       memory_used = isolate_memory_value(sp.check_output(memory_used_command.split(), stderr=sp.STDOUT))
       percentage = float(memory_used)/float(memory_total)*100
       percentage_cache.append(percentage)
       percentage_cache = percentage_cache[max(0, len(percentage_cache) - 10):]

       print("curr: " + str(percentage) + " %", "\navg:  " + str(sum(percentage_cache)/len(percentage_cache))[:4] + " %\n")
       time.sleep(1)

main()

Upvotes: 0

Vivasvan Patel
Vivasvan Patel

Reputation: 2968

In the command nvidia-smi -l 1 --query-gpu=memory.used --format=csv

the -l stands for:

-l, --loop= Probe until Ctrl+C at specified second interval.

So the command:

COMMAND = 'nvidia-smi -l 1 --query-gpu=memory.used --format=csv'
sp.check_output(COMMAND.split())

will never terminate and return.

It works if you remove the event loop from the command(nvidia-smi) to python.

Here is the code:

import subprocess as sp
import os
from threading import Thread , Timer
import sched, time

def get_gpu_memory():
    output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
    ACCEPTABLE_AVAILABLE_MEMORY = 1024
    COMMAND = "nvidia-smi --query-gpu=memory.used --format=csv"
    try:
        memory_use_info = output_to_list(sp.check_output(COMMAND.split(),stderr=sp.STDOUT))[1:]
    except sp.CalledProcessError as e:
        raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
    memory_use_values = [int(x.split()[0]) for i, x in enumerate(memory_use_info)]
    # print(memory_use_values)
    return memory_use_values


def print_gpu_memory_every_5secs():
    """
        This function calls itself every 5 secs and print the gpu_memory.
    """
    Timer(5.0, print_gpu_memory_every_5secs).start()
    print(get_gpu_memory())

print_gpu_memory_every_5secs()

"""
Do stuff.
"""

Upvotes: 14

Related Questions