guido
guido

Reputation: 2896

Boost / increase volume of text to speech (AVSpeechUtterance) to make it louder

I have a navigation app that gives direction voice instruction (e.g. " In 200 feet turn left") using AVSpeechUtterance. I have put volume to 1 like so. speechUtteranceInstance.volume = 1, but still the volume is very low compared to the music or podcast coming from the iPhone, especially when the sound is on a Bluetooth or cabled connection (like connected to car with Bluetooth)

Is there any way to boost the volume? (I know this has been asked before on SO but so far have not found a solution that works for me.)

Upvotes: 2

Views: 2652

Answers (3)

Vladimír
Vladimír

Reputation: 759

Try this:

import Speech

try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: [])

let utterance = AVSpeechUtterance(string: "Hello world")        
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")

let synthesizer = AVSpeechSynthesizer()
synthesizer.speak(utterance)

Upvotes: -1

guido
guido

Reputation: 2896

After a lot more research and playing around, I found a good workaround solution.

First of all I think this is an iOS bug. When all below conditions are true I found that the voice instruction itself is also ducked (or at least it sounds ducked) resulting in the voice instruction playing at the same volume as the DUCKED music (thus way too soft to hear well).

  • Playing music in the background
  • Ducking this background music through the .duckOther audioSessionCategory
  • Playing a voiceUtterance through AVSpeechSynthesizer
  • Playing audio over a connected bluetooth device (like bluetooth headset or bluetooth car speakers)

The workaround solution I found is to feed the speechUtterance to an AVAudioEngine. This can only be done on iOS13 or above, since that adds the .write method to AVSpeechSynthesizer

In short I use AVAudioEngine, AVAudioUnitEQ and AVAudioPlayerNode, setting the globalGain property of the AVAudioUnitEQ to about 10 dB. There are also a few quirks with this, but they can be worked around (see code comments).

Here's the complete code:

import UIKit
import AVFoundation
import MediaPlayer

class ViewController: UIViewController {

    // MARK: AVAudio properties
    var engine = AVAudioEngine()
    var player = AVAudioPlayerNode()
    var eqEffect = AVAudioUnitEQ()
    var converter = AVAudioConverter(from: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 22050, channels: 1, interleaved: false)!, to: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!)
    let synthesizer = AVSpeechSynthesizer()
    var bufferCounter: Int = 0

    let audioSession = AVAudioSession.sharedInstance()




    override func viewDidLoad() {
        super.viewDidLoad()



        let outputFormat = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!
        setupAudio(format: outputFormat, globalGain: 0)



    }

    func activateAudioSession() {
        do {
            try audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .duckOthers])
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print("An error has occurred while setting the AVAudioSession.")
        }
    }

    @IBAction func tappedPlayButton(_ sender: Any) {

        eqEffect.globalGain = 0
        play()

    }

    @IBAction func tappedPlayLoudButton(_ sender: Any) {
        eqEffect.globalGain = 10
        play()

    }

    func play() {
        let path = Bundle.main.path(forResource: "voiceStart", ofType: "wav")!
        let file = try! AVAudioFile(forReading: URL(fileURLWithPath: path))
        self.player.scheduleFile(file, at: nil, completionHandler: nil)
        let utterance = AVSpeechUtterance(string: "This is to test if iOS is able to boost the voice output above the 100% limit.")
        synthesizer.write(utterance) { buffer in
            guard let pcmBuffer = buffer as? AVAudioPCMBuffer, pcmBuffer.frameLength > 0 else {
                print("could not create buffer or buffer empty")
                return
            }

            // QUIRCK Need to convert the buffer to different format because AVAudioEngine does not support the format returned from AVSpeechSynthesizer
            let convertedBuffer = AVAudioPCMBuffer(pcmFormat: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: pcmBuffer.format.sampleRate, channels: pcmBuffer.format.channelCount, interleaved: false)!, frameCapacity: pcmBuffer.frameCapacity)!
            do {
                try self.converter!.convert(to: convertedBuffer, from: pcmBuffer)
                self.bufferCounter += 1
                self.player.scheduleBuffer(convertedBuffer, completionCallbackType: .dataPlayedBack, completionHandler: { (type) -> Void in
                    DispatchQueue.main.async {
                        self.bufferCounter -= 1
                        print(self.bufferCounter)
                        if self.bufferCounter == 0 {
                            self.player.stop()
                            self.engine.stop()
                            try! self.audioSession.setActive(false, options: [])
                        }
                    }

                })

                self.converter!.reset()
                //self.player.prepare(withFrameCount: convertedBuffer.frameLength)
            }
            catch let error {
                print(error.localizedDescription)
            }
        }
        activateAudioSession()
        if !self.engine.isRunning {
            try! self.engine.start()
        }
        if !self.player.isPlaying {
            self.player.play()
        }
    }

    func setupAudio(format: AVAudioFormat, globalGain: Float) {
        // QUIRCK: Connecting the equalizer to the engine somehow starts the shared audioSession, and if that audiosession is not configured with .mixWithOthers and if it's not deactivated afterwards, this will stop any background music that was already playing. So first configure the audio session, then setup the engine and then deactivate the session again.
        try? self.audioSession.setCategory(.playback, options: .mixWithOthers)

        eqEffect.globalGain = globalGain
        engine.attach(player)
        engine.attach(eqEffect)
        engine.connect(player, to: eqEffect, format: format)
        engine.connect(eqEffect, to: engine.mainMixerNode, format: format)
        engine.prepare()

        try? self.audioSession.setActive(false)

    }

}

Upvotes: 10

glotcha
glotcha

Reputation: 578

The docs mention that the default for .volume is 1.0 and that's the loudest. Actual loudness is based on the user volume settings. I didn't really have an issue with the speech being not loud enough if the user has the volume turned up.

Maybe you could consider showing a visual warning if the user volume level is below a certain level. Seems like this answer shows how to do that via AVAudioSession.

AVAudioSession is worth exploring as there are some settings that do impact speech output... like for example does you app's speech interrupt audio from other apps.

Upvotes: 0

Related Questions