Cheok Yan Cheng
Cheok Yan Cheng

Reputation: 42796

Comparing WhisperX and Faster-Whisper on RunPod: Speed, Accuracy, and Optimization

Recently, I compared the performance of WhisperX and Faster-Whisper on RunPod's server using the following code snippet.

WhisperX

model = whisperx.load_model(
    "large-v3", "cuda"
)

def run_whisperx_job(job):
    start_time = time.time()

    job_input = job['input']
    url = job_input.get('url', "")

    print(f"🚧 Loading audio from {url}...")
    audio = whisperx.load_audio(url)
    print("✅ Audio loaded")

    print("Transcribing...")
    result = model.transcribe(audio, batch_size=16)

    end_time = time.time()
    time_s = (end_time - start_time)
    print(f"🎉 Transcription done: {time_s:.2f} s")
    #print(result)

    # For easy migration, we are following the output format of runpod's 
    # official faster whisper.
    # https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/predict.py#L111
    output = {
        'detected_language' : result['language'],
        'segments' : result['segments']
    }

    return output

Faster-whisper

# Load Faster-Whisper model
model = WhisperModel("large-v3", device="cuda", compute_type="float16")

def run_faster_whisper_job(job):
    start_time = time.time()
    
    job_input = job['input']
    url = job_input.get('url', "")

    print(f"🚧 Downloading audio from {url}...")
    audio_path = download_files_from_urls(job['id'], [url])[0]
    print("✅ Audio downloaded")
    
    print("Transcribing...")
    segments, info = model.transcribe(audio_path, beam_size=5)
    
    output_segments = []
    for segment in segments:
        output_segments.append({
            "start": segment.start,
            "end": segment.end,
            "text": segment.text
        })
    
    end_time = time.time()
    time_s = (end_time - start_time)
    print(f"🎉 Transcription done: {time_s:.2f} s")
    
    output = {
        'detected_language': info.language,
        'segments': output_segments
    }
    
    # ✅ Safely delete the file after transcription
    try:
        if os.path.exists(audio_path):
            os.remove(audio_path)  # Using os.remove()
            print(f"🗑️ Deleted {audio_path}")
        else:
            print("⚠️ File not found, skipping deletion")
    except Exception as e:
        print(f"❌ Error deleting file: {e}")

    rp_cleanup.clean(['input_objects'])

    return output

General Findings

Accuracy Observations

Optimization Questions

I was wondering what parameters in WhisperX I can experiment with or fine-tune in order to:

Thank you.

Upvotes: 0

Views: 52

Answers (0)

Related Questions