Alexander Polus
Alexander Polus

Reputation: 9

Voice to String in Swift

the app I'm currently making in Swift will help blind people navigate the world using this one comprehensive solution. I am looking to make a generic function for the app that when called, will immediately start recording, listen for the user to say something, and once the user stops speaking, it will automatically stop recording, convert the recording to a string, and return it. This function should be usable more than once in a single view controller.

I have tried using the technique from this article and it didn't work:

The recorder will be collecting the name of a building or a room in a building, so it doesn't need to be recording for terribly long - even a set length of time of 5 seconds would work. I am hoping to use a framework like Speech or something with Siri, but I am not opposed to using an external framework like Watson if it works better. Please help!

Upvotes: 0

Views: 1437

Answers (1)


Reputation: 1854

There's a beautiful appcoda tutorial here, which fits this perfectly.

This is the code they used to update a text field with the speech results. It can't be too difficult to channel the text going in their text field into whatever variable/function you use to process the result.

//  ViewController.swift
//  Siri
//  Created by Sahand Edrisian on 7/14/16.
//  Copyright © 2016 Sahand Edrisian. All rights reserved.

import UIKit
import Speech

class ViewController: UIViewController, SFSpeechRecognizerDelegate {

    @IBOutlet weak var textView: UITextView!
    @IBOutlet weak var microphoneButton: UIButton!

    private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US"))!

    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?
    private let audioEngine = AVAudioEngine()

    override func viewDidLoad() {

        microphoneButton.isEnabled = false

        speechRecognizer.delegate = self

        SFSpeechRecognizer.requestAuthorization { (authStatus) in

            var isButtonEnabled = false

            switch authStatus {
            case .authorized:
                isButtonEnabled = true

            case .denied:
                isButtonEnabled = false
                print("User denied access to speech recognition")

            case .restricted:
                isButtonEnabled = false
                print("Speech recognition restricted on this device")

            case .notDetermined:
                isButtonEnabled = false
                print("Speech recognition not yet authorized")

            OperationQueue.main.addOperation() {
                self.microphoneButton.isEnabled = isButtonEnabled

    @IBAction func microphoneTapped(_ sender: AnyObject) {
        if audioEngine.isRunning {
            microphoneButton.isEnabled = false
            microphoneButton.setTitle("Start Recording", for: .normal)
        } else {
            microphoneButton.setTitle("Stop Recording", for: .normal)

    func startRecording() {

        if recognitionTask != nil {  //1
            recognitionTask = nil

        let audioSession = AVAudioSession.sharedInstance()  //2
        do {
            try audioSession.setCategory(AVAudioSessionCategoryRecord)
            try audioSession.setMode(AVAudioSessionModeMeasurement)
            try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
        } catch {
            print("audioSession properties weren't set because of an error.")

        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()  //3

        guard let inputNode = audioEngine.inputNode else {
            fatalError("Audio engine has no input node")
        }  //4

        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
        } //5

        recognitionRequest.shouldReportPartialResults = true  //6

        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in  //7

            var isFinal = false  //8

            if result != nil {

                self.textView.text = result?.bestTranscription.formattedString  //9
                isFinal = (result?.isFinal)!

            if error != nil || isFinal {  //10
                inputNode.removeTap(onBus: 0)

                self.recognitionRequest = nil
                self.recognitionTask = nil

                self.microphoneButton.isEnabled = true

        let recordingFormat = inputNode.outputFormat(forBus: 0)  //11
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in

        audioEngine.prepare()  //12

        do {
            try audioEngine.start()
        } catch {
            print("audioEngine couldn't start because of an error.")

        textView.text = "Say something, I'm listening!"


    func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {
        if available {
            microphoneButton.isEnabled = true
        } else {
            microphoneButton.isEnabled = false

Upvotes: 2

Related Questions