Reputation: 21
I am running the GPT-2 code of the large model(774M). It is used for the generation of text samples through interactive_conditional_samples.py , link: here
So I've given an input file containing prompts which are automatically selected to generate output. This output is also automatically copied into a file. In short, I'm not training it, I'm using the model to generate text. Also, I'm using a single GPU.
The problem I'm facing in this is, The code is not utilizing the GPU fully.
By using nvidia-smi command, I was able to see the below image
https://i.sstatic.net/f02p7.jpg
Upvotes: 2
Views: 831
Reputation:
It depends on your application. It is not unusual to have low GPU utilization when the batch_size
is small. Try increasing the batch_size
for more GPU utilization.
In your case, you have set batch_size=1
in your program. Increase the batch_size
to a larger number and verify the GPU utilization.
Let me explain using MNIST size networks. They are tiny and it's hard to achieve high GPU (or CPU) efficiency for them. You will get higher computational efficiency with larger batch size, meaning you can process more examples per second, but you will also get lower statistical efficiency, meaning you need to process more examples total to get to target accuracy. So it's a trade-off. For tiny character models, the statistical efficiency drops off very quickly after a batch_size=100, so it's probably not worth trying to grow the batch size for training. For inference, you should use the largest batch size you can.
Hope this answers your question. Happy Learning.
Upvotes: 1