user22347502
user22347502

Reputation: 41

How to transcribe multiple audio files at once using Whisper finetuned model?

TL;DR: I'm trying to transcribe multiple files together using Hugging face fine-tuned whisper ai model and extract the output as a single text file

I have this code which works and transcribes an audio and shows its output as a string of text. But I want to improve upon this by making this transcribe multiple files together, and exporting its output to a text file with each line representing a single audio file.

What did I try?

Im not a coder but I asked bing to generate a code and it came up with this which has errors.

audio_files = ["/content/audio1", "/content/audio2", ..., "/content/audioN"]
transcriptions = []

for audio_file in audio_files:
    transcription = pipe(audio_file, chunk_length_s=10, stride_length_s=(4, 2))
    transcriptions.append(transcription)

with open("transcriptions.txt", "w") as f:
    for transcription in transcriptions:
        f.write(transcription + "\n")

What I want?

I need a code which transcribes all the audio that I have into a single text file on which each line represents an audio file(preferably starting with the file name). If I can specify a folder which has all the files for transcription instead of entering each file manually, that would be AWESOME.

Whats my workspace?

I'm using hugging face open ai whisper(fine-tuned) to transcribe my files on google colab.

Any of your help is deeply appreciated.

Upvotes: 0

Views: 1268

Answers (1)

Simon Huang
Simon Huang

Reputation: 374

import os
from transformers import pipeline
import glob

# 初始化whisper pipeline(请替换为你的微调模型路径)
pipe = pipeline("automatic-speech-recognition", model="your-finetuned-model")

# 设置包含音频文件的文件夹路径(Colab中建议使用绝对路径)
audio_folder = "/content/your_audio_folder/"

# 获取文件夹内所有音频文件(支持常见格式)
audio_files = glob.glob(os.path.join(audio_folder, "*.[wW][aA][vV]"))  # WAV格式
audio_files += glob.glob(os.path.join(audio_folder, "*.[mM][pP]3"))     # MP3格式
audio_files += glob.glob(os.path.join(audio_folder, "*.[fF][lL][aA][cC]")) # FLAC格式

# 检查是否找到音频文件
if not audio_files:
    raise ValueError(f"No audio files found in {audio_folder}")

transcriptions = []

for audio_path in audio_files:
    try:
        # 获取带扩展名的文件名
        file_name = os.path.basename(audio_path)
        
        # 执行转录
        result = pipe(
            audio_path,
            chunk_length_s=30,
            stride_length_s=(5, 3),
            return_timestamps=False
        )
        
        # 提取文本并组合文件名
        transcriptions.append(f"{file_name}| {result['text']}")
        
        print(f"成功转录: {file_name}")
    
    except Exception as e:
        print(f"处理 {os.path.basename(audio_path)} 时出错: {str(e)}")
        transcriptions.append(f"{file_name}| [转录失败]")

# 写入文本文件(UTF-8编码保证兼容性)
with open("transcriptions.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(transcriptions))

print(f"转录完成!共处理 {len(audio_files)} 个文件,结果保存至 transcriptions.txt")

Upvotes: 0

Related Questions