elbert rivas
elbert rivas

Reputation: 1464

How to parse the body result from alexa custom skill to JSON

I was able to set up the Alexa custom skill and call it from my IOS app. When I test the skill in the Alexa console simulator, I can see the JSON output and I want to get it from my IOS app, however, I'm having a problem with how to parse it. My IOS code is working fine as I can hear the text response from my custom skill.

Here's the sample JSON output I want to get/parse

    {
        "body": {
            "version": "1.0",
            "response": {
                "outputSpeech": {
                    "type": "SSML",
                    "ssml": "<speak>Hello this is a custom skill</speak>"
                },
                "type": "_DEFAULT_RESPONSE"
            },
            "sessionAttributes": {},
            "userAgent": "ask-node/2.9.0 Node/v10.23.1 sample/hello-world/v1.2"
        }
    }
    

Here's the code for posting the recording

func postRecording(authToken:String, jsonData:String, audioData: Data) {
        var request = URLRequest(url: URL(string: EVENTS_ENDPOINT)!)
        //request.cachePolicy = NSURLRequest.CachePolicy.ReloadIgnoringCacheData
        request.httpShouldHandleCookies = false
        request.timeoutInterval = 60
        request.httpMethod = "POST"
        request.setValue("Bearer \(authToken)", forHTTPHeaderField: "Authorization")
        
        let boundry = NSUUID().uuidString
        let contentType = "multipart/form-data; boundary=\(boundry)"
        
        request.setValue(contentType, forHTTPHeaderField: "Content-Type")
        
        var bodyData = Data()
        
        bodyData.append("--\(boundry)\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append("Content-Disposition: form-data; name=\"metadata\"\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append("Content-Type: application/json; charset=UTF-8\r\n\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append(jsonData.data(using: String.Encoding.utf8)!)
        bodyData.append("\r\n".data(using: String.Encoding.utf8)!)
        
        bodyData.append("--\(boundry)\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append("Content-Disposition: form-data; name=\"audio\"\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append("Content-Type: audio/L16; rate=16000; channels=1\r\n\r\n".data(using: String.Encoding.utf8)!)
        bodyData.append(audioData)
        bodyData.append("\r\n".data(using: String.Encoding.utf8)!)
        
        bodyData.append("--\(boundry)--\r\n".data(using: String.Encoding.utf8)!)
        
        session.uploadTask(with: request, from: bodyData, completionHandler: { (data:Data?, response:URLResponse?, error:Error?) -> Void in
            if (error != nil) {
                print("Send audio error: \(String(describing: error?.localizedDescription))")
            } else {
                let res = response as! HTTPURLResponse
                if (res.statusCode >= 200 && res.statusCode <= 299) {
                    if let contentTypeHeader = res.allHeaderFields["Content-Type"] {
                        let boundary = self.extractBoundary(contentTypeHeader: contentTypeHeader as! String)
                        let directives = self.extractDirectives(data: data!, boundary: boundary)
                        self.directiveHandler?(directives)
                    } else {
                        print("Content type in response is empty")
                    }
                }
            }
        }).resume()
    }

func extractDirectives(data: Data, boundary: String) -> [DirectiveData] {
        var directives = [DirectiveData]()
        
        let innerBoundry = "--\(boundary)".data(using: String.Encoding.utf8)!
        let endBoundry = "--\(boundary)--".data(using: String.Encoding.utf8)!
        let contentTypeApplicationJson = "Content-Type: application/json; charset=UTF-8".data(using: String.Encoding.utf8)!
        let contentTypeAudio = "Content-Type: application/octet-stream".data(using: String.Encoding.utf8)!
        let headerEnd = "\r\n\r\n".data(using: String.Encoding.utf8)!
        
        var startIndex = 0
        while (true) {
            let firstAppearance = data.range(of: innerBoundry, in: startIndex..<(data.count))
            if (firstAppearance == nil) {
                break
            }
            var secondAppearance = data.range(of: innerBoundry, in: (firstAppearance?.upperBound)!..<(data.count))
            if (secondAppearance == nil) {
                secondAppearance = data.range(of: endBoundry, in: (firstAppearance?.upperBound)!..<(data.count))
                if (secondAppearance == nil) {
                    break
                }
            } else {
                startIndex = (secondAppearance?.lowerBound)!
            }
            let subdata = data.subdata(in: (firstAppearance?.upperBound)!..<(secondAppearance?.lowerBound)!)
            var contentType = subdata.range(of: contentTypeApplicationJson)
            if (contentType != nil) {
                let headerRange = subdata.range(of: headerEnd)
                var directiveData = String(data: subdata.subdata(in: (headerRange?.upperBound)!..<subdata.count), encoding: String.Encoding.utf8) ?? "Directive data is not String"
                directiveData = directiveData.replacingOccurrences(of: "\r\n", with: "")
                directives.append(DirectiveData(contentType: "application/json", data: directiveData.data(using: String.Encoding.utf8)!))
                print("Directive: \(directiveData)")
            }
            contentType = subdata.range(of: contentTypeAudio)
            if (contentType != nil) {
                let headerRange = subdata.range(of: headerEnd)
                let audioData = subdata.subdata(in: (headerRange?.upperBound)!..<subdata.count)
                directives.append(DirectiveData(contentType: "application/octet-stream", data: audioData))
                print("Audio data")
            }
        }
        return directives
    }

My current code only returns this sample response

{
    "header": {
        "namespace": "SpeechSynthesizer",
        "name": "Speak",
        "messageId": "35601dca-413f-4788-93ed-305e04328d0e",
        "dialogRequestId": "db933fd9-3766-407d-99f6-2b33d9c814e6",
        "keys": {
            "isBlocking": true,
            "channel": "audio"
        }
    },
    "payload": {
        "format": "AUDIO_MPEG",
        "token": "amzn1.as-ct.v1.ThirdPartySdkSpeechlet#ACRI#ValidatedSpeakDirective_amzn1.ask.skill.5718d3ec-b56a-4a5a-bd61-fdd4ab501fc2_ca3ae40b-52cd-4be9-b831-98e975a57e5a_VoiceInitiated#ACRI#[[ENCRYPTED_WITH_AlexaServiceKeyMasterUtil]]AAAAAAAAAAAHT6zrrzQh3EDQVwUGrjOpUAAAAAAAAACFZJ/VU0i9/OCY89UvtcV9X0291NLF7L7nE4irlDAGaph7NlUpgq0Ps31s8OflwHh/9wr3vp5Py+Gz+2Lgd8N6fPXWxPHp8+E6R8sLiRh9Cg==",
     
    }
}

Upvotes: 0

Views: 689

Answers (1)

Greg Bulmash
Greg Bulmash

Reputation: 1957

If I'm reading this right, you won't get that response.

Looks like you have an AVS device posting audio to the Alexa voice service.

The audio you send is received by the service, converted into an intent and data, and sent to your skill.

The response you're looking for with the "speak" tags is sent by the ASK SDK from your skill to Alexa.

Alexa converts that into audio and sends the audio to your AVS client, but doesn't send a copy of the direct output from your skill command that produced it. You simply can't parse it out of the response.

Upvotes: 2

Related Questions