Alligator
Alligator

Reputation: 730

How do I run Whisper on an entire directory?

I'd like to transcribe speech to text using Whisper. I have been able to successfully run it on a single file using the command:

whisper audio.wav

I'd like to run it on a large number of files in a single director called "Audio" on my desktop. I tried to write this into Python as follows:

import whisper
import os

model = whisper.load_model("base")

for filename in os.listdir('Audio'):   
    model.transcribe(filename)   

It appears to start, but then gives me some errors about "No such file or directory." Is there some way I can correct this to run Whisper on all the .wav files in my Audio directory?

Error:

/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
  File "/opt/homebrew/lib/python3.10/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/Desktop/transcribe.py", line 7, in <module>
    model.transcribe(filename)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py", line 84, in transcribe
    mel = log_mel_spectrogram(audio)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 47, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
221211_1834.wav: No such file or directory

Upvotes: 3

Views: 5869

Answers (4)

PwL
PwL

Reputation: 11

In my case: 30 audio files (m4a) in one folder, language english. Works nice.

import os
import whisper
from tqdm import tqdm

# Define the folder where the m4a files are located
root_folder = "/Users/_____/Music/Renewal"

# Set up Whisper client
print("Loading whisper model...")
model = whisper.load_model("base")
print("Whisper model complete.")

# Get the number of m4a files in the root folder and its sub-folders
print("Getting number of files to transcribe...")
num_files = sum(1 for dirpath, dirnames, filenames in os.walk(root_folder) for filename in filenames if filename.endswith(".m4a"))
print("Number of files: ", num_files)

# Transcribe the m4a files and display a progress bar
with tqdm(total=num_files, desc="Transcribing Files") as pbar:
    for dirpath, dirnames, filenames in os.walk(root_folder):
        for filename in filenames:
            if filename.endswith(".m4a"):
                filepath = os.path.join(dirpath, filename)
                result = model.transcribe(filepath, language="en", fp16=False, verbose=True)
                transcription = result['text']
                # Write transcription to text file
                filename_no_ext = os.path.splitext(filename)[0]
                with open(os.path.join(dirpath, filename_no_ext + '.txt'), 'w') as f:
                    f.write(transcription)
                pbar.update(1)

Upvotes: 1

Franck Dernoncourt
Franck Dernoncourt

Reputation: 83387

How do I run Whisper on an entire directory?

On and , one can use the following command:

for i in *.{flac,mp3,wav}; do whisper "$i" --model large > "$i".txt; done

It will loop over all the .flac , .mp3, and .wav files in the current folder, and use whisper to transcribe them.

Limitation: the above command reloads the model for each file, which is inefficient, especially if transcribing many small audio files with the large whisper model.


Example of output file:

[00:00.000 --> 00:08.780]  So if you ever have any offline media in Premiere, all you have to do is when the locate box
[00:08.780 --> 00:14.540]  has popped up, just select the cine punch folder and hit the search button and all it's
[00:14.540 --> 00:19.580]  going to do is connect one file, hit OK, and then it will reconnect everything else.

(Text author: PHANTAZMA VFX - Tutorials for Video Editing. The text is the transcription of https://youtu.be/zHxK6Wadd8Q?si=r2_fVc7J8ohFVCSq, which is under Creative Commons Attribution license with reuse allowed)

Upvotes: 1

Joseph Thomas
Joseph Thomas

Reputation: 1

Transcribe multiple files in a directory folder in English only.

I recommend you modify your script to tell Whisper to translate the audio files in ENGLISH only. This is useful if you know the default language. The Whisper Open Ai language detection is not perfect as every audio file is not prefect. Welsh Dutch and Norwegian may be used and can potentially render whole folder translation project useless. Simply load "base.en" --English. Excuse me coders and moderators, we all had a beginning. Thank you.

# Set up Whisper client English
print("Loading whisper model...")
model = whisper.load_model("base.en")
print("Whisper model complete.")

Upvotes: 0

user21201312
user21201312

Reputation: 46

Here's an option for you. It does the following:

1 - Finds all .wav files in the "root folder" & sub-folders. You need to change this to your "Audio" folder location.

2 - Shows progress bar as it's transcribing the files (done using tqdm).

3 - Saves a .txt file containing the transcription next to the .wav files.

CODE:

import os
import whisper
from tqdm import tqdm

# Define the folder where the wav files are located
root_folder = "/Users/downloads"

# Set up Whisper client
print("Loading whisper model...")
model = whisper.load_model("base")
print("Whisper model complete.")

# Get the number of wav files in the root folder and its sub-folders
print("Getting number of files to transcribe...")
num_files = sum(1 for dirpath, dirnames, filenames in os.walk(root_folder) for filename in filenames if filename.endswith(".wav"))
print("Number of files: ", num_files)

# Transcribe the wav files and display a progress bar
with tqdm(total=num_files, desc="Transcribing Files") as pbar:
    for dirpath, dirnames, filenames in os.walk(root_folder):
        for filename in filenames:
            if filename.endswith(".wav"):
                filepath = os.path.join(dirpath, filename)
                result = model.transcribe(filepath, fp16=False, verbose=True)
                transcription = result['text']
                # Write transcription to text file
                filename_no_ext = os.path.splitext(filename)[0]
                with open(os.path.join(dirpath, filename_no_ext + '.txt'), 'w') as f:
                    f.write(transcription)
                pbar.update(1)

Upvotes: 3

Related Questions