Reputation: 879
I'm getting mixed results with the Azure KeyPhrase API - sometimes successful (by that I mean 200 result) and others I'm getting 400 bad request. To test the service, I'm sending the contents from a Azure PDF on their NoSQL service.
The documentation says that each document may be upto 5k characters. So as to rule that out, (I started off with 5k) I'm limiting each to at most 1k characters.
How can I can get more info on what is the cause of the failure? I've already checked the Portal, but there's not much detail there.
I am using this endpoint: https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases
Some sample failures:
{"documents":[{"language":"en","id":1,"text":"David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright © 2014 Chappell & Associates"}]}
{"documents":[{"language":"en","id":1,"text":"3 Relational technology has been the dominant approach to working with data for decades. Typically accessed using Structured Query Language (SQL), relational databases are incredibly useful. And as their popularity suggests, they can be applied in many different situations. But relational technology isn’t always the best approach. Suppose you need to work with very large amounts of data, for example, too much to store on a single machine. Scaling relational technology to work effectively across many servers (physical or virtual) can be challenging. Or suppose your application works with data that’s not a natural fit for relational systems, such as JavaScript Object Notation (JSON) documents. Shoehorning the data into relational tables is possible, but a storage technology expressly designed to work with this kind of information might be simpler. NoSQL technologies have been created to address problems like these. As the name suggests, the label encompasses a variety of storage"}]}
** added my quick/dirty poc code ***
List<string> sendRequest(object data)
{
string url = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases";
string key = "api-code-here";
string hdr = "Ocp-Apim-Subscription-Key";
var wc = new WebClient();
wc.Headers.Add(hdr, key);
wc.Headers.Add(HttpRequestHeader.ContentType, "application/json");
TextAnalyticsResult results = null;
string json = JsonConvert.SerializeObject(data);
try
{
var bytes = Encoding.Default.GetBytes(json);
var d2 = wc.UploadData(url, bytes);
var dataString = Encoding.Default.GetString(d2);
results = JsonConvert.DeserializeObject<TextAnalyticsResult>(dataString);
}
catch (Exception ex)
{
var s = ex.Message;
}
System.Threading.Thread.Sleep(125);
if (results != null && results.documents != null)
return results.documents.SelectMany(x => x.keyPhrases).ToList();
else
return new List<string>();
}
Called by:
foreach (var k in vals)
{
data.documents.Clear();
int countSpaces = k.Count(Char.IsWhiteSpace);
if (countSpaces > 3)
{
if (k.Length > maxLen)
{
var v = k;
while (v.Length > maxLen)
{
var tmp = v.Substring(0, maxLen);
var idx = tmp.LastIndexOf(" ");
tmp = tmp.Substring(0, idx).Trim();
data.documents.Add(new
{
language = "en",
id = data.documents.Count() + 1,
text = tmp
});
v = v.Substring(idx + 1).Trim();
phrases.AddRange(sendRequest(data));
data.documents.Clear();
}
data.documents.Add(new
{
language = "en",
id = data.documents.Count() + 1,
text = v
});
phrases.AddRange(sendRequest(data));
data.documents.Clear();
}
else
{
data.documents.Add(new
{
language = "en",
id = 1,
text = k
});
phrases.AddRange(sendRequest(data));
data.documents.Clear();
};
}
}
Upvotes: 0
Views: 282
Reputation: 116
I manually created some requests using the document samples that you indicated had errors and they were processed by the service correctly and returned key phrases. So an encoding issue looks likely.
In the future, you can also look at the inner error returned by the service. Usually you'll see some more details like in the response sample below.
{
"code": "BadRequest",
"message": "Invalid request",
"innerError": {
"code": "InvalidRequestContent",
"message": "Request contains duplicated Ids. Make sure each document has a unique Id."
}
}
Also, there is a .NET SDK for Text Analytics that can help simplify calling the service. https://github.com/Azure/azure-rest-api-specs/tree/current/specification/cognitiveservices/data-plane/TextAnalytics
Upvotes: 3
Reputation: 17466
Try changing this line
var bytes = Encoding.Default.GetBytes(json);
to
var bytes = Encoding.UTF8.GetBytes(json);
Upvotes: 1