Reputation: 113
I am trying to add subtitles to a video using the recommended method in the docs
from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.video.io.VideoFileClip import VideoFileClip
generator = lambda txt: TextClip(txt, font='Georgia-Regular', fontsize=24, color='white')
sub = SubtitlesClip("Output.srt", generator)
myvideo = VideoFileClip("video.mp4")
final = CompositeVideoClip([myvideo, sub])
final.write_videofile("final.mp4", fps=myvideo.fps,threads = 4)
This will take more than 2 hour to process, however when removing the subtitles (as below) it runs less than a minute, please let me know if I am missing something or if this is normal, appreciated!
from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.video.io.VideoFileClip import VideoFileClip
myvideo = VideoFileClip("video.mp4")
final = CompositeVideoClip([myvideo])
final.write_videofile("final.mp4", fps=myvideo.fps,threads = 4)
Upvotes: 0
Views: 1467
Reputation: 113
The issue was with my "Output.srt" file had wrong timestamps going way farther than the video length.
For reference,for anyone using pytube to download captions, replace the 'xml_caption_to_srt' method in the pytube source code, located in captions.py module, to the following
def xml_caption_to_srt(self, xml_captions: str) -> str:
"""Convert xml caption tracks to "SubRip Subtitle (srt)".
:param str xml_captions:
XML formatted caption tracks.
"""
segments = []
root = ElementTree.fromstring(xml_captions)[1]
i = 0
for child in list(root):
if child.tag == 'p':
caption = ''
if len(list(child)) == 0:
continue
for s in list(child):
if s.tag == 's':
caption += ' ' + s.text
caption = unescape(caption.replace("\n", " ").replace(" ", " "), )
try:
duration = float(child.attrib["d"]) / 1000.0
except KeyError:
duration = 0.0
start = float(child.attrib["t"]) / 1000.0
end = start + duration
sequence_number = i + 1 # convert from 0-indexed to 1.
line = "{seq}\n{start} --> {end}\n{text}\n".format(
seq=sequence_number,
start=self.float_to_srt_time_format(start),
end=self.float_to_srt_time_format(end),
text=caption,
)
segments.append(line)
i += 1
return "\n".join(segments).strip()
Using that method, you can extract captions with the correct timestamps as below
from pytube import YouTube
yt_transcript = YouTube('video_link')
caption = yt_transcript.captions['a.en']
en_caption_convert_to_srt =(caption.generate_srt_captions())
#save the caption to a file named Output.txt
text_file = open("Output.srt", "w")
text_file.write(en_caption_convert_to_srt)
text_file.close()
Upvotes: 0