relot
relot

Reputation: 701

Get PID of highest memory consuming process with nvidia-smi

Currently I have a little bash script that sums up the vram usage of all processes.

nvidia-smi | awk '{print $6}'| awk '{ SUM += $1} END { print SUM }'

But now I want to get the PID of the process that uses the most VRAM. After that I would like to get the user of the PID with

ps -u -p $pid

EDIT currently my nvidia-smi looks like this: I want to get the PID 29187 because this PID uses the most vram(3649MB)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0  On |                  N/A |
| 35%   43C    P8    36W / 250W |   5012MiB / 11016MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5512      G   /usr/lib/xorg/Xorg                           257MiB |
|    0      5786      G   kwin_x11                                      77MiB |
|    0      5790      G   /usr/bin/krunner                               6MiB |
|    0      5800      G   /usr/bin/plasmashell                         114MiB |
|    0     26439      G   /usr/lib/xorg/Xorg                            73MiB |
|    0     26457      G   /usr/bin/sddm-greeter                        132MiB |
|    0     29154    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   619MiB |
|    0     29187      C   python                                      3649MiB |
|    0     29999      G   /opt/ros/melodic/lib/rviz/rviz                62MiB |
+-----------------------------------------------------------------------------+

Alternatively as an image: https://i.sstatic.net/ypVza.png

Final Script in which it works.

sum=$(nvidia-smi | awk 'NR>14{SUM+=$6} NR>14 && 
0+$6>MAX{MAX=0+$6;MAXSTRING=$6;MAXPID=$3} END{printf SUM;}')

maxpid=$(nvidia-smi | awk 'NR>14{SUM+=$6} NR>14 && 
0+$6>MAX{MAX=0+$6;MAXSTRING=$6;MAXPID=$3} END{printf MAXPID;}')

if [ $sum -lt 8000 ]
then
    hostname | tr -d '\n'
    echo ' 'AVAILABLE vram  $sum 
else 
    hostname | tr -d '\n'
    user=$(ps -u -p $maxpid| awk '{print$1}'| awk 'FNR == 2 {print}')
    echo ' 'false, $user is responsible
fi

Upvotes: 2

Views: 1714

Answers (2)

Inian
Inian

Reputation: 85845

For a more efficient parsing involving all the SI metric units KiB and GiB you can do something below. This needs GNU awk for array sorting functionality

awk '
$6 ~ /[M|K|G]iB/ {
    if ( index( $6, "KiB" ) ) { gsub(/[^0-9]/, "", $6); ram = ($6+0) * 1024 }
    if ( index( $6, "MiB" ) ) { gsub(/[^0-9]/, "", $6); ram = ($6+0) * 1024 * 1024 }
    if ( index( $6, "GiB" ) ) { gsub(/[^0-9]/, "", $6); ram = ($6+0) * 1024 * 1024 * 1024 }
    usage += ram
    map[$3] = ram
}
END {
    PROCINFO["sorted_in"]="@val_num_desc"
    printf "Total RAM Usage = %s\n",((usage/1024)/1024)"MiB"
    for (i in map) {
        printf "Highest RAM Usage PID = %d Value = %s\n", i, ((map[i]/1024)/1024)"MiB"
        break
    }
}'

Upvotes: 1

JNevill
JNevill

Reputation: 50248

You can combine your two awk statements and also fix the issue where it adds the numbers in the Driver Version: into your total like so:

nvidia-smi | awk 'NR>14{SUM+=$6}END{print SUM}'

That only considers the row if it's greater than row 14 (where the data you care about lives.

Adding some logic to grab the max and print out the process:

nvidia-smi | awk 'NR>14{SUM+=$6} NR>14 && 0+$6>MAX{MAX=0+$6;MAXSTRING=$6;MAXPID=$3} END{print SUM,MAXPID,MAXSTRING}'

That will print out the sum, the PID that had the max GPU Memory Usage and that Memory Usage.

This breaks down though if GPU Memory Usage can switch units (like between MiB and KiB).

Upvotes: 2

Related Questions