I am using an AVCaptureSession
to use video and audio input and encode an H.264 video with AVAssetWriter
If I don't write the audio, the video is encoded as expected. But if I write the audio, I am getting a corrupt video.
If I inspect the audio CMSampleBuffer
being supplied to the AVAssetWriter
it shows this information:
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
formatDescription = <CMAudioFormatDescription 0x17410ba30 [0x1b3a70bb8]> {
mediaSpecific: {
mSampleRate: 44100.000000
mFormatID: 'lpcm'
mFormatFlags: 0xc
mBytesPerPacket: 2
mFramesPerPacket: 1
mBytesPerFrame: 2
mChannelsPerFrame: 1
mBitsPerChannel: 16 }
cookie: {(null)}
ACL: {(null)}
FormatList Array: {(null)}
extensions: {(null)}
Since it is supplying lpcm audio, I have configured the AVAssetWriterInput
with this setting for sound (I have tried both one and two channels):
var channelLayout = AudioChannelLayout()
memset(&channelLayout, 0, MemoryLayout<AudioChannelLayout>.size);
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Mono
let audioOutputSettings:[String: Any] = [AVFormatIDKey as String:UInt(kAudioFormatLinearPCM),
AVNumberOfChannelsKey as String:1,
AVSampleRateKey as String:44100.0,
AVLinearPCMIsBigEndianKey as String:false,
AVLinearPCMIsFloatKey as String:false,
AVLinearPCMBitDepthKey as String:16,
AVLinearPCMIsNonInterleaved as String:false,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:MemoryLayout<AudioChannelLayout>.size)]
self.assetWriterAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: audioOutputSettings)
When I use the lpcm setting above, I cannot open the video with any application. I have tried using kAudioFormatMPEG4AAC
and kAudioFormatAppleLossless
and I still get a corrupt video but I am able to view the video using QuickTime Player 8 (not QuickTime Player 7), but it is confused about the duration of the video and no sound is played.
When recording is complete I am calling:
func endRecording(_ completionHandler: @escaping () -> ()) {
isRecording = false
assetWriter.finishWriting(completionHandler: completionHandler)
This is how the AVCaptureSession
is being configured:
func setupCapture() {
captureSession = AVCaptureSession()
if (captureSession == nil) {
fatalError("ERROR: Couldnt create a capture session")
captureSession?.sessionPreset = AVCaptureSessionPreset1280x720
let frontDevices = AVCaptureDevice.devices().filter{ ($0 as AnyObject).hasMediaType(AVMediaTypeVideo) && ($0 as AnyObject).position == AVCaptureDevicePosition.front }
if let captureDevice = frontDevices.first as? AVCaptureDevice {
do {
let videoDeviceInput: AVCaptureDeviceInput
do {
videoDeviceInput = try AVCaptureDeviceInput(device: captureDevice)
catch {
fatalError("Could not create AVCaptureDeviceInput instance with error: \(error).")
guard (captureSession?.canAddInput(videoDeviceInput))! else {
do {
let audioDevice = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio)
let audioDeviceInput: AVCaptureDeviceInput
do {
audioDeviceInput = try AVCaptureDeviceInput(device: audioDevice)
catch {
fatalError("Could not create AVCaptureDeviceInput instance with error: \(error).")
guard (captureSession?.canAddInput(audioDeviceInput))! else {
do {
let dataOutput = AVCaptureVideoDataOutput()
dataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA]
dataOutput.alwaysDiscardsLateVideoFrames = true
let queue = DispatchQueue(label: "com.3DTOPO.videosamplequeue")
dataOutput.setSampleBufferDelegate(self, queue: queue)
guard (captureSession?.canAddOutput(dataOutput))! else {
videoConnection = dataOutput.connection(withMediaType: AVMediaTypeVideo)
do {
let audioDataOutput = AVCaptureAudioDataOutput()
let queue = DispatchQueue(label: "com.3DTOPO.audiosamplequeue")
audioDataOutput.setSampleBufferDelegate(self, queue: queue)
guard (captureSession?.canAddOutput(audioDataOutput))! else {
audioConnection = audioDataOutput.connection(withMediaType: AVMediaTypeAudio)
// this will trigger capture on its own queue
The AVCaptureVideoDataOutput
delegate method:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
// func captureOutput(captureOutput: AVCaptureOutput, sampleBuffer: CMSampleBuffer, connection:AVCaptureConnection) {
var error: CVReturn
if (connection == audioConnection) {
delegate?.audioSampleUpdated(sampleBuffer: sampleBuffer)
// ... Write video buffer ...//
Which calls:
func audioSampleUpdated(sampleBuffer: CMSampleBuffer) {
if (isRecording) {
while !assetWriterAudioInput.isReadyForMoreMediaData {}
if (!assetWriterAudioInput.append(sampleBuffer)) {
print("Unable to write to audio input");
If I disable the assetWriterAudioInput.append()
call above, then the video isn't corrupt but of course I have no audio encoded. How can I get both video and audio encoding to work?
I figured it out. I was setting the assetWriter.startSession
source time to 0, and then subtracting the start time from current CACurrentMediaTime()
for writing the pixel data.
I changed the assetWriter.startSession
source time to the CACurrentMediaTime()
and don't subtract the current time when writing the video frame.
Old start session code:
assetWriter.startSession(atSourceTime: kCMTimeZero)
New code that works:
let presentationStartTime = CMTimeMakeWithSeconds(CACurrentMediaTime(), 240)
assetWriter.startSession(atSourceTime: presentationStartTime)
