Reputation: 1933
I'm trying to edit segments of multiple movies together into one clip using Swift, AVFoundation, and AVKit on macOS. The following Swift code is a good example of what I'm trying to do:
import AVFoundation
import AVKit
let source1 = AVAsset(url: URL(string: "")!)
let source2 = AVAsset(url: URL(string: "")!)
let comp = AVMutableComposition()
comp.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)
comp.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid)
func cmtime(_ i: Double) -> CMTime {
return CMTime(seconds: i, preferredTimescale: 600)
func insertSecond(into: AVMutableComposition, from: AVAsset, start: CMTime, at: CMTime) throws {
let videoTrack = into.tracks(withMediaType: .video).first!
let audioTrack = into.tracks(withMediaType: .audio).first!
try videoTrack.insertTimeRange(
CMTimeRange(start: start , duration: cmtime(1.0)),
of: from.tracks(withMediaType: .video).first!,
at: at
try audioTrack.insertTimeRange(
CMTimeRange(start: start, duration: cmtime(1.0)),
of: from.tracks(withMediaType: .audio).first!,
at: at
try insertSecond(into: comp, from: source1, start: cmtime(3.0), at: cmtime(0.0))
try insertSecond(into: comp, from: source2, start: cmtime(2.0), at: cmtime(1.0))
try insertSecond(into: comp, from: source1, start: cmtime(100.0), at: cmtime(2.0))
try insertSecond(into: comp, from: source2, start: cmtime(3.0), at: cmtime(3.0))
try insertSecond(into: comp, from: source1, start: cmtime(350.0), at: cmtime(4.0))
if let sess = AVAssetExportSession(asset: comp, presetName: "AVAssetExportPresetHighestQuality") {
sess.outputURL = URL(fileURLWithPath: "/tmp/output.mp4")
sess.outputFileType = .mp4
sess.exportAsynchronously {
print(sess.error ?? "success")
Running this code does produce an output.mp4
file successfully, and that file can be played in Quicktime with no problems. You should be able to paste the above code into a Playground to reproduce the video (the source videos are all publicly available sample videos hosted on the web). I've also uploaded it to S3 here, so you can download and analyze it without having to run the code yourself.
However, attempting to open or process it with any other video software results in errors.
VLC will attempt to play the file, but has a very difficult time with it. The video freezes a lot, desyncs with the audio, includes frames that Quicktime doesn't show at all, and skips some sections entirely.
Firefox will also attempt to play with file, but clearly can't decode it properly and has glitchy video output. Chrome freezes after the first second of playback.
I tried diagnosing further using ffprobe
and ffmpeg
Running ffprobe -show_frames output.mp4 1>/dev/null
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fe841801800] DTS -24000 < 24000 out of order
[h264 @ 0x7fe843022800] reference count overflow
[h264 @ 0x7fe843022800] decode_slice_header error
[h264 @ 0x7fe843022800] no frame!
[h264 @ 0x7fe843022800] deblocking_filter_idc 6 out of range
[h264 @ 0x7fe843022800] decode_slice_header error
[h264 @ 0x7fe843022800] no frame!
[h264 @ 0x7fe843022800] deblocking_filter_idc 6 out of range
[h264 @ 0x7fe843022800] decode_slice_header error
[h264 @ 0x7fe843022800] no frame!
[h264 @ 0x7fe843022800] top block unavailable for requested intra mode -1
[h264 @ 0x7fe843022800] error while decoding MB 5 0, bytestream 947
[h264 @ 0x7fe843022800] concealing 3600 DC, 3600 AC, 3600 MV errors in P frame
[h264 @ 0x7fe843022800] mmco: unref short failure
[h264 @ 0x7fe843022800] cabac_init_idc 4 overflow
[h264 @ 0x7fe843022800] decode_slice_header error
[h264 @ 0x7fe843022800] no frame!
[h264 @ 0x7fe843022800] deblocking filter parameters -43 0 out of range
[h264 @ 0x7fe843022800] decode_slice_header error
[h264 @ 0x7fe843022800] no frame!
Attempting to transcode to another format with ffmpeg (ffmpeg -i output.mp4 output.avi
) has a lot of warnings and errors:
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 73, current: 71; changing to 74. This may result in incorrect timestamps in the output file.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fe1f4802800] DTS -24000 < 24000 out of order
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 74, current: 72; changing to 75. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 75, current: 73; changing to 76. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 76, current: 74; changing to 77. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 77, current: 75; changing to 78. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f4849600] reference count overflow
[h264 @ 0x7fe1f4849600] decode_slice_header error
[h264 @ 0x7fe1f4849600] no frame!
Error while decoding stream #0:1: Invalid data found when processing input
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 145, current: 143; changing to 146. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 146, current: 144; changing to 147. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 147, current: 145; changing to 148. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f483d800] deblocking_filter_idc 6 out of range
[h264 @ 0x7fe1f483d800] decode_slice_header error
[h264 @ 0x7fe1f483d800] no frame!
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 148, current: 146; changing to 149. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 149, current: 147; changing to 150. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f4849600] deblocking_filter_idc 6 out of range
[h264 @ 0x7fe1f4849600] decode_slice_header error
[h264 @ 0x7fe1f4849600] no frame!
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 150, current: 148; changing to 151. This may result in incorrect timestamps in the output file.
Error while decoding stream #0:1: Invalid data found when processing input
Last message repeated 1 times
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 151, current: 149; changing to 152. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 152, current: 150; changing to 153. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 153, current: 151; changing to 154. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 154, current: 152; changing to 155. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 155, current: 153; changing to 156. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 156, current: 154; changing to 157. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 157, current: 155; changing to 158. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 158, current: 156; changing to 159. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 159, current: 157; changing to 160. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 160, current: 158; changing to 161. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 161, current: 159; changing to 162. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 162, current: 160; changing to 163. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 163, current: 161; changing to 164. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 164, current: 162; changing to 165. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 165, current: 163; changing to 166. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 166, current: 164; changing to 167. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 167, current: 165; changing to 168. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 168, current: 166; changing to 169. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 169, current: 167; changing to 170. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 170, current: 168; changing to 171. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 171, current: 169; changing to 172. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 172, current: 170; changing to 173. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 173, current: 171; changing to 174. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 174, current: 172; changing to 175. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 175, current: 173; changing to 176. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 176, current: 174; changing to 177. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 177, current: 175; changing to 178. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 178, current: 176; changing to 179. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 179, current: 177; changing to 180. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 180, current: 178; changing to 181. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 181, current: 179; changing to 182. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 182, current: 180; changing to 183. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 183, current: 181; changing to 184. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 184, current: 182; changing to 185. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f483d800] top block unavailable for requested intra mode -1
[h264 @ 0x7fe1f483d800] error while decoding MB 5 0, bytestream 947
[h264 @ 0x7fe1f483d800] concealing 3600 DC, 3600 AC, 3600 MV errors in P frame
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 185, current: 183; changing to 186. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 186, current: 184; changing to 187. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f4849600] mmco: unref short failure
[h264 @ 0x7fe1f4849600] cabac_init_idc 4 overflow
[h264 @ 0x7fe1f4849600] decode_slice_header error
[h264 @ 0x7fe1f4849600] no frame!
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 187, current: 185; changing to 188. This may result in incorrect timestamps in the output file.
[h264 @ 0x7fe1f485fa00] deblocking filter parameters -43 0 out of range
[h264 @ 0x7fe1f485fa00] decode_slice_header error
[h264 @ 0x7fe1f485fa00] no frame!
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 188, current: 186; changing to 189. This may result in incorrect timestamps in the output file.
Error while decoding stream #0:1: Invalid data found when processing input
Last message repeated 1 times
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 189, current: 187; changing to 190. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 190, current: 188; changing to 191. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 191, current: 189; changing to 192. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 192, current: 190; changing to 193. This may result in incorrect timestamps in the output file.
[avi @ 0x7fe1f5804e00] Non-monotonous DTS in output stream 0:1; previous: 193, current: 191; changing to 194. This may result in incorrect timestamps in the output file.
The code above is just one example, I've seen similar problems with varying degrees of severity in lots of variations on this code. I've tried many things, including:
rather than .mp4
and AVMutableMovie
(and various setting tweaks, e.g. setting AVURLAssetPreferPreciseDurationAndTimingKey
to true) rather than AVMutableComposition
objects in different waysbut to no avail—I can't seem to get AVFoundation to produce a video file that other tools can process.
Any help is appreciated, even just any thoughts on what's unusual about the encoding of the output file, which you can download here if you can't or don't want to run the above Swift code to reproduce it yourself.
Upvotes: 3
Views: 2454
Reputation: 622
I agree that the playback problems in the other players stem from having more than one format description per track. But there is actually no need for an expensive transcoding of tracks before compositing, AVFoundation can do this for you... if you're willing to jump through a few hoops.
The key is that an AVMutableComposition
can have more than one track of a certain media type, and that AVAssetExportSession
can "mix" down such compositions to just one track for each media type. AVFoundation kind of acknowledges the problem with multiple format descriptions per track by providing mutableTrackCompatibleWithTrack:
. So when you want so insert a segment from a given source track, you can ask AVMutableComposition
for a suitable target track, and if none is returned, add a new one.
As mentioned, there are a few things to keep in mind:
you can't insert into a given destination track somewhere in "void time" beyond its current end. To workaround that, note the current end time of the destination track, append the segment at that time, and after that insert an empty segment of the right duration at the former track end. The sample below shows this, under the simplifying assumption, that you're always appending. If you want to insert anywhere in an existing track, you'll need a bit more elaborate logic there.
to actually have AVAssetExportSession
to mix everything down to just one track per media type, you have to set an AVAudioMix and an AVVideoComposition on the export session.
The sample code below, based on your original example, produces an output.mp4 which plays correctly in VLC, Chrome and Firefox and throws no errors when examined with ffmpeg.
import AVFoundation
import Foundation
let source0 = AVAsset(url: URL(string: "")!)
let source1 = AVAsset(url: URL(string: "")!)
let comp = AVMutableComposition()
func cmtime(_ i: Double) -> CMTime {
return CMTime(seconds: i, preferredTimescale: 600)
func insertTrackSecond( srcAsset: AVAsset, dstComp: AVMutableComposition, mediaType: AVMediaType, start: CMTime, at: CMTime) throws {
let srcTrack: AVAssetTrack = srcAsset.tracks(withMediaType: mediaType).first!
// get a compatible destination track or, if not available, create a new one
let dstTrack: AVMutableCompositionTrack = dstComp.mutableTrack(compatibleWith: srcTrack) ?? dstComp.addMutableTrack(withMediaType: mediaType, preferredTrackID: kCMPersistentTrackID_Invalid)!
// can't insert into "void" time beyond the current end of track. Instead, note current end time, append there, and *after* appending, insert empty range
var dstTrackEnd: CMTime = CMTimeRangeGetEnd( dstTrack.timeRange)
if CMTIME_IS_INVALID( dstTrackEnd) {
dstTrackEnd = kCMTimeZero
try dstTrack.insertTimeRange( CMTimeRangeMake( start, cmtime( 1.0)), of: srcTrack, at: dstTrackEnd)
// now add empty range, if necessary
if CMTimeCompare( dstTrackEnd, at) == -1 {
dstTrack.insertEmptyTimeRange( CMTimeRangeFromTimeToTime( dstTrackEnd, at))
func insertSecond( srcAsset: AVAsset, dstComp: AVMutableComposition, start: CMTime, at: CMTime) throws
try insertTrackSecond(srcAsset: srcAsset, dstComp: dstComp, mediaType: .video, start: start, at: at)
try insertTrackSecond(srcAsset: srcAsset, dstComp: dstComp, mediaType: .audio, start: start, at: at)
try insertSecond( srcAsset: source0, dstComp: comp, start: cmtime(3.0), at: cmtime(0.0))
try insertSecond( srcAsset: source1, dstComp: comp, start: cmtime(2.0), at: cmtime(1.0))
try insertSecond( srcAsset: source0, dstComp: comp, start: cmtime(100.0), at: cmtime(2.0))
try insertSecond( srcAsset: source1, dstComp: comp, start: cmtime(3.0), at: cmtime(3.0))
try insertSecond( srcAsset: source0, dstComp: comp, start: cmtime(350.0), at: cmtime(4.0))
if let sess = AVAssetExportSession(asset: comp, presetName: "AVAssetExportPresetHighestQuality") {
sess.outputURL = URL(fileURLWithPath: "/tmp/output.mp4")
sess.outputFileType = .mp4
// this leaves smaller videotracks at the origin, in their "natural" size. Manipulate the "preferredTransform" property of the mutable composition tracks for nicer results
sess.videoComposition = AVVideoComposition.init(propertiesOf: comp)
// not assigning an audio mix results in an output with multiple audio tracks
var inputParameters = [AVAudioMixInputParameters]()
for audioTrack: AVAssetTrack in comp.tracks( {
inputParameters.append( AVMutableAudioMixInputParameters.init(track:audioTrack))
let audioMix: AVMutableAudioMix = AVMutableAudioMix();
audioMix.inputParameters = inputParameters;
sess.audioMix = audioMix;
let semaphore: DispatchSemaphore = DispatchSemaphore(value:0);
sess.exportAsynchronously {
print(sess.error ?? "success")
Upvotes: 5
Reputation: 93319
As @RhythmicFistman says, your video stream is a concat of multiple H264 streams with differing properties. The parameters of a H264 stream can be stored, typically, either in-band (called Annex B) or within the global metadata when stored in a container like MP4 (stsd
). What AVF did here was add multiple stsd entries.
stsd: s= 326 (0x00000146), o= 1982552 (0x001e4058)
version: 0
flags: 0x000000
sample_descriptions (0x00000002):
size: 0000009b
data_format: avc1 (61 76 63 31)
size: 0000009b
data_format: avc1 (61 76 63 31)
Most players will ignore the additional entries but the decoder needs this bitstream configuration for (re-)initialization.
There are two ways forward.
Re-encode each individual segment with the same encoding properties so that post-concat, the lack of decoder reinit is effectively not an issue,
either get AVF or another tool like mp4box to concat the streams as an avc3
stream whereby the bitstream parameters are also stored in-band. The decoder should encounter the new parameter sets and reinit.
Upvotes: 2
Reputation: 36169
This problem is almost certainly because the clip formats, both video and audio, do not match:
There are many things AVAssetExportSession
could have done but it seems to have chosen to dump all the formats in there which probably explains your compatibility problems. I can see why a player would be confused, the two video formats don't even have the same aspect ratio. Maybe this behaviour is a bug, or maybe it makes perfect sense in some situation. I don't know.
So you could:
+ AVAssetExportSession
or AVAssetReader
+ AVAssetWriter
p.s. I suspect video mismatches are more important than audio here, so perhaps testing will show that you can ignore audio?
Upvotes: 2