Reputation: 678
I am building a web application that involves serving various kinds of video content. Web-friendly audio and video codecs are handled without any problems, but I am having trouble designing the delivery of video files incompatible with HTML5 video players like mkv containers or H265.
What I have done till now, is use ffmpeg to transcode the video file on the server and make HLS master and VOD playlists and use hls.js on the frontend. The problem, however, is that ffmpeg treats the playlist as a live stream playlist until transcoding is complete on the whole file and then it changes the playlist to serve as VOD. So, the user can't seek until the transcoding is over, and that my server has unnecessarily transcoded the whole file if the user decides to seek the video file halfway ahead. I am using the following ffmpeg command line arguments
ffmpeg -i sample.mkv \
-c:v libx264 \
-crf 18 \
-preset ultrafast \
-maxrate 4000k \
-bufsize 8000k \
-vf "scale=1280:-1,format=yuv420p" \
-c:a copy -start_number 0 \
-hls_time 10 \
-hls_list_size 0 \
-f hls \
file.m3u8
Now to improve upon this system, I tried to generate the VOD playlist through my app and not ffmpeg, since the format is self explanatory. The webapp would generate the HLS master and VOD playlists beforehand using the video properties such as duration, resolution and bitrate (which are known to the server) and serve the master playlist to the client. The client then starts requesting the individual video segments at which point the server will individually transcode and generate each segment and serve them. Seeking would be possible as the client already has the complete VOD playlist and it can request the specific segment that the user seeks to. The benefit, as I see it, would be that my server would not have to transcode the whole file, if the user decides to seek forward and play the video halfway through.
Now I tried manually creating segments (10s each) from my sample.mkv
using the following command
ffmpeg -ss 90 \
-t 10 \
-i sample.mkv \
-g 52 \
-strict experimental \
-movflags +frag_keyframe+separate_moof+omit_tfhd_offset+empty_moov \
-c:v libx264 \
-crf 18 \
-preset ultrafast \
-maxrate 4000k \
-bufsize 8000k \
-vf "scale=1280:-1,format=yuv420p" \
-c:a copy \
fileSequence0.mp4
and so on for other segments, and the VOD playlist as
#EXTM3U
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:10
#EXT-X-VERSION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
fileSequence0.mp4
#EXTINF:10.0,
fileSequence1.mp4
...
... and so on
...
#EXT-X-ENDLIST
which plays the first segment just fine but not the subsequent ones.
Now my questions,
Why don't the subsequent segments play? What am I doing wrong?
Is my technique even viable? Would there be any problem with presetting the segment durations since segmenting is only possible after keyframes and whether ffmpeg can get around this?
My knowledge regarding video processing and generation borders on modest at best. I would greatly appreciate some pointers.
Upvotes: 12
Views: 12102
Reputation:
You could add a #EXT-X-DISCONTINUITY
tag for each segment after the first.
I use #EXT-X-DISCONTINUITY
tags with MPEGTS and SCTE-35 to splice in ads with completely different timestamps and continuity counters.
So something like this:
#EXTM3U
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:10
#EXT-X-VERSION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
fileSequence0.mp4
#EXT-X-DISCONTINUITY
#EXTINF:10.0,
fileSequence1.mp4
#EXT-X-DISCONTINUITY
#EXTINF:10.0,
fileSequence2.mp4 ...
... and so on
...
#EXT-X-ENDLIST
A quote from the rfc:
The client MUST be prepared to reset its parser(s) and decoder(s)
before playing a Media Segment that has an EXT-X-DISCONTINUITY tag
applied to it; otherwise, playback errors can occur.
Upvotes: 1
Reputation: 797
We have a similar setup for serving real-time transcodes on our server. So reading this, I see a lot of similarities. I hope I can help a bit.
First, a few things I've noticed:
I'm not quite sure why you want to encode the content on demand, either to create a unique file each time, or to save disk space or time, or other reasons, so some of my ideas may not apply here, but maybe some of them will help:
Encode the video up front already, and serve that file instead. This takes some encoding time, but you can really get your server to start crunching numbers, making the file as small as possible, which saves you harddisk and bandwidth costs, and you only need to do the encoding once.
If you do want to encode the file on demand, I would still encode the files into a file format that's really fast to read by ffmpeg. You could already properly encode the audio into its final format, so you don't have to do this later. So, when it's time to do the on-demand encode, it only needs to encode a video file that's really fast to read, and then copy the audio into it. I would then also consider converting the intermediate file from 4k60 / to 4k30 or even to 1080p30. This makes encoding a lot faster.
If you have access to the server, I really recommend you to add a GPU, like the Intel QuickSync on the CPU, or add an nVidia Quadro p400 card, which is inexpensive, but is much faster, delivers better quality and smaller file sizes. (FFMPEG can do this, no other software required). A single Quadro card can reach 2x 300 fps in 1080p. So it can get a 20 minute video ready in just a minute.
Depending on the duration of the video and size, it can be done really fast. And if you want even faster, you could do it in parallel with other encoding servers/gpu's. Keep in mind that your storage location will need to be able to keep up. Our bottleneck turned out to be the gigabit network connection.
With enough power at your disposal, the video could be completely encoded in 30 seconds or so. Before the video starts, you can show an ad or countdown clock, or even just an hourglass, and then he has a fully seekable video. This omits the problem altogether.
That's at least how we approached the problem. It works for us, but it's kindof a brute force solution.
Upvotes: 3
Reputation: 31140
It is possible, but it is very difficult. I would even argue that it may not be possible with ffmpeg. Transport streams have time stamps and continuity counters, these values should be preserved across segment boundaries. The -copyts flag may help a little with that. B frames are extreamly difficult to handle in this case because they will end up with timestamps outside the segment. Audio is difficult as well. Audio has priming samples when the encoder is initialized meaning you may have extra samples every segment That come through as audio pops.
TLDR, is possible, but you need to understand of the container and underlying codecs are structured, and work with them.
Upvotes: 2