Reputation: 651

AWS CloudSearch cannot upload documents

I am new to AWS and CloudSearch. I have written a very simple app which is to upload docx document (already use cs-import-document to convert to JSON format) to my seach domain.

Code is very straightforward as this:

using (var searchdomainclient = new AmazonCloudSearchDomainClient("http://search-xxxxx-xysjxyuxjxjxyxj.ap-southeast-2.cloudsearch.amazonaws.com"))
{

    // Test to upload doc                            

    var uploaddocrequest = new UploadDocumentsRequest()
    {
        FilePath = @"c:\temp\testsearch.sdf",  //docx to JSON already
        ContentType =  ContentType.ApplicationJson

    };
    var uploadresult = searchdomainclient.UploadDocuments(uploaddocrequest);

   }

However the exception I got is: "Root element is missing."

Here is the JSON stuff in the sdf file I want to upload:

[{
    "type": "add",
    "id": "c:_temp_testsearch.docx",
    "fields": {
        "template": "Normal.dotm",
        "application_name": "Microsoft Office Word",
        "paragraph_count": "1",
        "resourcename": "testsearch.docx",
        "date": "2014-07-28T23:52:00Z",
        "xmptpg_npages": "1",
        "page_count": "1",
        "publisher": "",
        "creator": "John Smith",
        "creation_date": "2014-07-28T23:52:00Z",
        "content": "Test5",
        "author": "John Smith",
        "last_modified": "2014-07-29T04:22:00Z",
        "revision_number": "3",
        "line_count": "1",
        "application_version": "15.0000",
        "last_author": "John Smith",
        "character_count": "5",
        "character_count_with_spaces": "5",
        "content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
    }
}]

So what's wrong with my approach?

Thanks heaps!

P.S. I can manually upload docx doc to that search doamin and use C# code to apply search.

============= Update 2014-08-04 ===================

I am not sure whether it is related to this or not. In the stack trace I found it tries to parse as XML file rather than JSON. But from my code I already set ContentType = JASON, but it seems no effect.

at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext.Read()
at Amazon.Runtime.Internal.Transform.ErrorResponseUnmarshaller.Unmarshall(XmlUnmarshallerContext context)
at Amazon.Runtime.Internal.Transform.JsonErrorResponseUnmarshaller.Unmarshall(JsonUnmarshallerContext context)
at Amazon.CloudSearchDomain.Model.Internal.MarshallTransformations.UploadDocumentsResponseUnmarshaller.UnmarshallException(JsonUnmarshallerContext context, Exception innerException, HttpStatusCode statusCode)
at Amazon.Runtime.Internal.Transform.JsonResponseUnmarshaller.UnmarshallException(UnmarshallerContext input, Exception innerException, HttpStatusCode statusCode)
at Amazon.Runtime.AmazonWebServiceClient.HandleHttpWebErrorResponse(AsyncResult asyncResult, WebException we)
at Amazon.Runtime.AmazonWebServiceClient.getResponseCallback(IAsyncResult result)
at Amazon.Runtime.AmazonWebServiceClient.endOperation[T](IAsyncResult result)
at Amazon.CloudSearchDomain.AmazonCloudSearchDomainClient.EndUploadDocuments(IAsyncResult asyncResult)
at Amazon.CloudSearchDomain.AmazonCloudSearchDomainClient.UploadDocuments(UploadDocumentsRequest request)


at Amazon.CloudSearchDomain.Model.Internal.MarshallTransformations.UploadDocumentsResponseUnmarshaller.UnmarshallException(JsonUnmarshallerContext context, Exception innerException, HttpStatusCode statusCode)

Upvotes: 1

Answers (2)

vadym

Reputation: 1

I had exactly the same exception with SDK version 2.2.2.0. When I had updated SDK to version 2.2.2.1 exception went away.

Upvotes: 0

alexroussos

Reputation: 2681

Your document id contains invalid characters (period and colon). From https://aws.amazon.com/articles/8871401284621700 :

The ID must be unique across all of the documents you upload to the domain and can contain the following characters: a-z (lowercase letters), 0-9, and the underscore character (_). Document IDs must start with a letter or number and can be up to 64 characters long.

It is also unclear what endpoint you're posting to but you may also have a problem there.

Upvotes: 2

AWS CloudSearch cannot upload documents

Answers (2)

Related Questions