Reputation: 41
TL;DR: I'm trying to transcribe multiple files together using Hugging face fine-tuned whisper ai model and extract the output as a single text file
I have this code which works and transcribes an audio and shows its output as a string of text. But I want to improve upon this by making this transcribe multiple files together, and exporting its output to a text file with each line representing a single audio file.
Im not a coder but I asked bing to generate a code and it came up with this which has errors.
audio_files = ["/content/audio1", "/content/audio2", ..., "/content/audioN"]
transcriptions = []
for audio_file in audio_files:
transcription = pipe(audio_file, chunk_length_s=10, stride_length_s=(4, 2))
transcriptions.append(transcription)
with open("transcriptions.txt", "w") as f:
for transcription in transcriptions:
f.write(transcription + "\n")
I need a code which transcribes all the audio that I have into a single text file on which each line represents an audio file(preferably starting with the file name). If I can specify a folder which has all the files for transcription instead of entering each file manually, that would be AWESOME.
I'm using hugging face open ai whisper(fine-tuned) to transcribe my files on google colab.
Any of your help is deeply appreciated.
Upvotes: 0
Views: 1268
Reputation: 374
import os
from transformers import pipeline
import glob
# 初始化whisper pipeline(请替换为你的微调模型路径)
pipe = pipeline("automatic-speech-recognition", model="your-finetuned-model")
# 设置包含音频文件的文件夹路径(Colab中建议使用绝对路径)
audio_folder = "/content/your_audio_folder/"
# 获取文件夹内所有音频文件(支持常见格式)
audio_files = glob.glob(os.path.join(audio_folder, "*.[wW][aA][vV]")) # WAV格式
audio_files += glob.glob(os.path.join(audio_folder, "*.[mM][pP]3")) # MP3格式
audio_files += glob.glob(os.path.join(audio_folder, "*.[fF][lL][aA][cC]")) # FLAC格式
# 检查是否找到音频文件
if not audio_files:
raise ValueError(f"No audio files found in {audio_folder}")
transcriptions = []
for audio_path in audio_files:
try:
# 获取带扩展名的文件名
file_name = os.path.basename(audio_path)
# 执行转录
result = pipe(
audio_path,
chunk_length_s=30,
stride_length_s=(5, 3),
return_timestamps=False
)
# 提取文本并组合文件名
transcriptions.append(f"{file_name}| {result['text']}")
print(f"成功转录: {file_name}")
except Exception as e:
print(f"处理 {os.path.basename(audio_path)} 时出错: {str(e)}")
transcriptions.append(f"{file_name}| [转录失败]")
# 写入文本文件(UTF-8编码保证兼容性)
with open("transcriptions.txt", "w", encoding="utf-8") as f:
f.write("\n".join(transcriptions))
print(f"转录完成!共处理 {len(audio_files)} 个文件,结果保存至 transcriptions.txt")
Upvotes: 0