Reputation: 11
When submitting a document to the API for key phrases, the returned JSON response has the error "A document within the request was too large to be processed. Limit document size to: 10240 bytes."
According to https://learn.microsoft.com/en-us/azure/cognitive-services/cognitive-services-text-analytics-quick-start, "The maximum size of a single document that can be submitted is 10KB, and the total maximum size of submitted input is 1MB. No more than 1,000 documents may be submitted in one call."
The document in question is a string of length 7713. The byte length using Encoding.UTF8.GetBytes() is 7763.
The entire byteArray that is submitted is of length 7965.
Smaller strings work fine, but any strings greater than length 3000 seem to have this problem. Below is the code, written in VB.NET:
' Create a JSONInput object containing the data to submit
Dim myJsonObject As New JSONInput
Dim input1 As New JSONText
input1.id = "1"
input1.text = text
myJsonObject.documents.Add(input1)
' Translate the JSONInput object to a serialized JSON string
Dim jss As New JavaScriptSerializer()
Dim jsonString As String = jss.Serialize(myJsonObject)
' Windows Cognitive Services URL
Dim request As System.Net.WebRequest = System.Net.WebRequest.Create("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases")
' Set the Method property of the request to POST.
request.Method = "POST"
' Add a header with the account key.
request.Headers.Add("Ocp-Apim-Subscription-Key", accountKey_TextAnalytics)
' Create POST data and convert it to a byte array.
Dim postData As String = jsonString
Dim byteArray As Byte() = Encoding.UTF8.GetBytes(postData)
' Set the ContentType property of the WebRequest.
request.ContentType = "application/json"
' Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length
' Get the request stream.
Dim dataStream As System.IO.Stream = request.GetRequestStream()
' Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length)
' Close the Stream object.
dataStream.Close()
' Get the response.
Dim response As System.Net.WebResponse = request.GetResponse()
' Get the stream containing content returned by the server.
dataStream = response.GetResponseStream()
' Open the stream using a StreamReader for easy access.
Dim reader As New System.IO.StreamReader(dataStream)
' Read the content.
Dim responseFromServer As String = reader.ReadToEnd()
' Display the content.
Console.WriteLine(responseFromServer)
' Clean up the streams.
reader.Close()
dataStream.Close()
response.Close()
' Deserialize the json data
jss = New JavaScriptSerializer()
Dim jsonData = jss.Deserialize(Of Object)(responseFromServer)
' List of key phrases to be returned
Dim phrases As New List(Of String)
For Each item As String In jsonData("documents")(0)("keyPhrases")
phrases.Add(item)
Next
Return phrases
My question is, what might I be doing wrong here, that I'm receiving messages that my document is exceeding the size limit of 10240 bytes, but it appears that the data that I POST is well under that limit.
Upvotes: 1
Views: 546
Reputation: 546
As Assaf mentioned above, please make sure to specify UTF-8 encoding.
Upvotes: 0