amateur
amateur

Reputation: 21

Tensorflow not fully utilizing GPU in GPT-2 program

I am running the GPT-2 code of the large model(774M). It is used for the generation of text samples through interactive_conditional_samples.py , link: here

So I've given an input file containing prompts which are automatically selected to generate output. This output is also automatically copied into a file. In short, I'm not training it, I'm using the model to generate text. Also, I'm using a single GPU.

The problem I'm facing in this is, The code is not utilizing the GPU fully.

By using nvidia-smi command, I was able to see the below image

https://i.sstatic.net/f02p7.jpg

Upvotes: 2

Views: 831

Answers (1)

user11530462
user11530462

Reputation:

It depends on your application. It is not unusual to have low GPU utilization when the batch_size is small. Try increasing the batch_size for more GPU utilization.

In your case, you have set batch_size=1 in your program. Increase the batch_size to a larger number and verify the GPU utilization.

Let me explain using MNIST size networks. They are tiny and it's hard to achieve high GPU (or CPU) efficiency for them. You will get higher computational efficiency with larger batch size, meaning you can process more examples per second, but you will also get lower statistical efficiency, meaning you need to process more examples total to get to target accuracy. So it's a trade-off. For tiny character models, the statistical efficiency drops off very quickly after a batch_size=100, so it's probably not worth trying to grow the batch size for training. For inference, you should use the largest batch size you can.

Hope this answers your question. Happy Learning.

Upvotes: 1

Related Questions