Azure KeyPhrase API returning 400 at times

I'm getting mixed results with the Azure KeyPhrase API - sometimes successful (by that I mean 200 result) and others I'm getting 400 bad request. To test the service, I'm sending the contents from a Azure PDF on their NoSQL service.

The documentation says that each document may be upto 5k characters. So as to rule that out, (I started off with 5k) I'm limiting each to at most 1k characters.

How can I can get more info on what is the cause of the failure? I've already checked the Portal, but there's not much detail there.

I am using this endpoint: https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases

Some sample failures:

** added my quick/dirty poc code ***

List<string> sendRequest(object data)
    {
        string url = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases";
        string key = "api-code-here";
        string hdr = "Ocp-Apim-Subscription-Key";
        var wc = new WebClient();
        wc.Headers.Add(hdr, key);
        wc.Headers.Add(HttpRequestHeader.ContentType, "application/json");

        TextAnalyticsResult results = null;

        string json = JsonConvert.SerializeObject(data);
        try
        {
            var bytes = Encoding.Default.GetBytes(json);
            var d2 = wc.UploadData(url, bytes);
            var dataString = Encoding.Default.GetString(d2);
            results = JsonConvert.DeserializeObject<TextAnalyticsResult>(dataString);                
        }
        catch (Exception ex)
        {
            var s = ex.Message;
        }
        System.Threading.Thread.Sleep(125);

        if (results != null && results.documents != null)
            return results.documents.SelectMany(x => x.keyPhrases).ToList();
        else
            return new List<string>();
    }

Called by:

foreach (var k in vals)
        {
            data.documents.Clear();
            int countSpaces = k.Count(Char.IsWhiteSpace);
            if (countSpaces > 3)
            {
                if (k.Length > maxLen)
                {
                    var v = k;
                    while (v.Length > maxLen)
                    {
                        var tmp = v.Substring(0, maxLen);
                        var idx = tmp.LastIndexOf(" ");
                        tmp = tmp.Substring(0, idx).Trim();
                        data.documents.Add(new
                        {
                            language = "en",
                            id = data.documents.Count() + 1,
                            text = tmp
                        });
                        v = v.Substring(idx + 1).Trim();

                        phrases.AddRange(sendRequest(data));
                        data.documents.Clear();
                    }

                    data.documents.Add(new
                    {
                        language = "en",
                        id = data.documents.Count() + 1,
                        text = v
                    });
                    phrases.AddRange(sendRequest(data));
                    data.documents.Clear();
                }
                else
                {
                    data.documents.Add(new
                    {
                        language = "en",
                        id = 1,
                        text = k
                    });

                    phrases.AddRange(sendRequest(data));
                    data.documents.Clear();
                };
            }             
        }

Upvotes: 0

Views: 282

Answers (2)

Brian Smith - MSFT
Brian Smith - MSFT

Reputation: 116

I manually created some requests using the document samples that you indicated had errors and they were processed by the service correctly and returned key phrases. So an encoding issue looks likely.

In the future, you can also look at the inner error returned by the service. Usually you'll see some more details like in the response sample below.

{
  "code": "BadRequest",
  "message": "Invalid request",
  "innerError": {
    "code": "InvalidRequestContent",
    "message": "Request contains duplicated Ids. Make sure each document has a unique Id."
  }
}

Also, there is a .NET SDK for Text Analytics that can help simplify calling the service. https://github.com/Azure/azure-rest-api-specs/tree/current/specification/cognitiveservices/data-plane/TextAnalytics

Upvotes: 3

Maria Ines Parnisari
Maria Ines Parnisari

Reputation: 17466

Try changing this line

var bytes = Encoding.Default.GetBytes(json);

to

var bytes = Encoding.UTF8.GetBytes(json);

Upvotes: 1

Related Questions