Yazid
Yazid

Reputation: 113

moviePy add subtitles

I am trying to add subtitles to a video using the recommended method in the docs

from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.video.io.VideoFileClip import VideoFileClip


generator = lambda txt: TextClip(txt, font='Georgia-Regular', fontsize=24, color='white')
sub = SubtitlesClip("Output.srt", generator)
myvideo = VideoFileClip("video.mp4")

final = CompositeVideoClip([myvideo, sub])
final.write_videofile("final.mp4", fps=myvideo.fps,threads = 4)

This will take more than 2 hour to process, however when removing the subtitles (as below) it runs less than a minute, please let me know if I am missing something or if this is normal, appreciated!

from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
from moviepy.video.io.VideoFileClip import VideoFileClip


myvideo = VideoFileClip("video.mp4")

final = CompositeVideoClip([myvideo])
final.write_videofile("final.mp4", fps=myvideo.fps,threads = 4)

Upvotes: 0

Views: 1467

Answers (1)

Yazid
Yazid

Reputation: 113

The issue was with my "Output.srt" file had wrong timestamps going way farther than the video length.

For reference,for anyone using pytube to download captions, replace the 'xml_caption_to_srt' method in the pytube source code, located in captions.py module, to the following

    def xml_caption_to_srt(self, xml_captions: str) -> str:
        """Convert xml caption tracks to "SubRip Subtitle (srt)".

        :param str xml_captions:
            XML formatted caption tracks.
        """
        segments = []
        root = ElementTree.fromstring(xml_captions)[1]
        i = 0
        for child in list(root):
            if child.tag == 'p':
                caption = ''
                if len(list(child)) == 0:
                    continue
                for s in list(child):
                    if s.tag == 's':
                        caption += ' ' + s.text
                caption = unescape(caption.replace("\n", " ").replace("  ", " "), )
                try:
                    duration = float(child.attrib["d"]) / 1000.0
                except KeyError:
                    duration = 0.0
                start = float(child.attrib["t"]) / 1000.0
                end = start + duration
                sequence_number = i + 1  # convert from 0-indexed to 1.
                line = "{seq}\n{start} --> {end}\n{text}\n".format(
                    seq=sequence_number,
                    start=self.float_to_srt_time_format(start),
                    end=self.float_to_srt_time_format(end),
                    text=caption,
                )
                segments.append(line)
                i += 1
        return "\n".join(segments).strip()

Using that method, you can extract captions with the correct timestamps as below

from pytube import YouTube

yt_transcript = YouTube('video_link')

caption = yt_transcript.captions['a.en']

en_caption_convert_to_srt =(caption.generate_srt_captions())

#save the caption to a file named Output.txt
text_file = open("Output.srt", "w")
text_file.write(en_caption_convert_to_srt)
text_file.close()

Upvotes: 0

Related Questions