Reputation: 1
My code works fine and ouputs the transcript into txt file. However what do I need to change if I want to output the transcript in SRT format with timestamps?
import os
import whisper
from tqdm import tqdm
# Define the folder where the mp4 files are located
root_folder = "C:\Video"
# Set up Whisper client
print("Loading whisper model...")
model = whisper.load_model("base")
print("Whisper model complete.")
# Get the number of mp4 files in the root folder and its sub-folders
print("Getting number of files to transcribe...")
num_files = sum(1 for dirpath, dirnames, filenames in os.walk(root_folder) for filename in filenames if filename.endswith(".mp4"))
print("Number of files: ", num_files)
# Transcribe the mp4 files and display a progress bar
with tqdm(total=num_files, desc="Transcribing Files") as pbar:
for dirpath, dirnames, filenames in os.walk(root_folder):
for filename in filenames:
if filename.endswith(".mp4"):
filepath = os.path.join(dirpath, filename)
result = model.transcribe(filepath, fp16=False, verbose=True)
transcription = result['text']
# Write transcription to text file
filename_no_ext = os.path.splitext(filename)[0]
with open(os.path.join(dirpath, filename_no_ext + '.txt'), 'w') as f:
f.write(transcription)
pbar.update(1)
Upvotes: 0
Views: 515