Different memory allocation on GTX 1080 ti, Tesla k80, Tesla v100 for the same pytorch model

Question

I have tried loading a distilbert model in pytorch over 3 different GPUs (GeForce GTX 1080 ti, tesla k80, tesla v100). According to the pytorch cuda profiler, the memory consumption is identical in all of these GPUs(534MB). But "nvidia-smi" shows different memory consumption for each of them (GTX 1080 ti- 1181MB, tesla k80 - 898MB, tesla v100- 1714MB).

I chose v100, hoping to accommodate more processes because of it's extra memory. Because of this, I am not able accommodate any more processes in v100 compared to k80.

Versions: Python 3.6.11, transformers==2.3.0, torch==1.6.0

Any help would be appreciated.

Following are the memory consumption in the GPUs.

----------------GTX 1080ti---------------------

2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-19 02:11:04,148 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of "nvidia-smi" :

2020-10-19 02:11:04,221 - CE - INFO - | ID | Name                | Serial          | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-19 02:11:04,222 - CE - INFO - |  0 | GeForce GTX 1080 Ti | [Not Supported] | GPU-58d5d4d3-07a1-81b4-ba67-8d6b46e342fb ||       50C |       15% |          11% ||      11178MB |      1181MB |      9997MB || Disabled     | Disabled       |

----------------Tesla k80---------------------

2020-10-19 12:15:37,030 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of "nvidia-smi" :

2020-10-19 12:15:37,081 - CE - INFO - | ID | Name      | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-19 12:15:37,081 - CE - INFO - |  0 | Tesla K80 | 0324516191902 | GPU-1e7baee8-174b-2178-7115-cf4a063a8923 ||       50C |        3% |           8% ||      11441MB |       898MB |     10543MB || Disabled     | Disabled       |

----------------Tesla v100---------------------

2020-10-20 08:18:42,952 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-20 08:18:42,952 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-20 08:18:42,953 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-20 08:18:42,953 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of "nvidia-smi" :

2020-10-20 08:18:43,020 - CE - INFO - | ID | Name                 | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-20 08:18:43,020 - CE - INFO - |  0 | Tesla V100-SXM2-16GB | 0323617004258 | GPU-849088a3-508a-1737-7611-75a087f18085 ||       29C |        0% |          11% ||      16160MB |      1714MB |     14446MB || Enabled      | Disabled       |

Different memory allocation on GTX 1080 ti, Tesla k80, Tesla v100 for the same pytorch model

Answers (0)

Related Questions