tldr : Am I right in assuming torch.cuda.init()
, device = "cuda"
and result = model.transcribe(etc)
should be enough to enforce gpu usage ?
I have checked on several forum posts and could not find a solution. Sorry if it's silly. I also posted on the whisper git but maybe it's not whisper-specific.
Here is my python script in a nutshell :
import whisper
import soundfile as sf
import torch
# specify the path to the input audio file
input_file = "H:\\path\\3minfile.WAV"
# specify the path to the output transcript file
output_file = "H:\\path\\transcript.txt"
# Cuda allows for the GPU to be used which is more optimized than the cpu
device = "cuda" # if torch.cuda.is_available() else "cpu"
# Load audio file
audio_data, sample_rate =, always_2d=True)
#load whisper model
model_size = "tiny"
print("loading model :", model_size)
model = whisper.load_model(model_size).to(device)
print(model_size, "model loaded")
# Initialize variables
results = []
language = "fr"
# Transcribe audio
with torch.cuda.device(device):
result = model.transcribe(audio_data, language=language, fp16=False, word_timestamps=True)
However, it is returning the following error on the last line, hinting that it's trying to run it on cpu :
RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 30623038517864 bytes.
I am using Jupyter, and i checked that the pytorch version it's using was the cuda/gpu one and not a cpu-locked version :
So I really don't get it. Could there be a conflict of pythorch libraries ? Am I doing something wrong ? Is the transcribe() function indeed using cpu instead of gpu ?
I am using anaconda3, here is what conda list
returns, in case it helps :
You need to make sure that you are installing PyTorch with CUDA support, to actually leverage your GPUs.
See here for reference documentation:
python3 -m pip3 install torch torchvision torchaudio --index-url
python3 -m pip3 install -U openai-whisper
you can then confirm in Python if CUDA is available
import torch
When using Whisper, you can directly offload the model to the GPU during initialization. To do so, you have to specify the device parameter in the load_model
So your corrected code would look like:
model = whisper.load_model(model_size, device="cuda")
You can now call the transcribe function directly, no need to use with torch.cuda.device(device)
Note that you actually do not need to specify the device parameter, Whisper attempts to use CUDA by default if it is present
