Kannaiyan
Kannaiyan

Reputation: 13025

Is AWS Cloudsearch Scalable?

I have 500MB worth of data to push to cloud search.

Here are the options I have tried:

Upload directly from console:

Tried to uplaod the file, there is a 5 MB limitation.

Then uploaded the file to S3 and selected the S3 option,

Upload to S3 and give S3 url in the console:

Fails and asks to try command line.

Tried with command line

aws cloudsearchdomain upload-documents --endpoint-url http://endpoint --content-type application/json --documents s3://bucket/cs.json

Error parsing parameter '--documents': Blob values must be a path to a file.

OK, copied the file from s3 to local and tried to upload,

Tried with local file and cli:

aws cloudsearchdomain upload-documents --endpoint-url http://endpoint --content-type application/json --documents ./cs.json

Connection was closed before we received a valid response from endpoint URL: "http://endpoint/2013-01-01/documents/batch?format=sdk".

Anyway to get CloudSearch to work?

Upvotes: 2

Views: 960

Answers (1)

Keet Sugathadasa
Keet Sugathadasa

Reputation: 13522

As I understand the question, this is not about the scalability of Cloudsearch as per the Question Header, but it is about the limitations of uploading, and how to upload a large file into Amazon Cloudsearch.

The best and optimal solution would be to upload data by chunking it. Break your document into batches and upload data in batches. (But keep in mind the limitations associated)

The advantage of this is, if you have multiple documents to submit, submit them all in a single call rather than always submitting batches of size 1. AWS recommends to group (up to 5 mb) and send in one call. Each 1,000 batch calls cost you $0.10, I think, so grouping also saves you some money.

This worked for me. Given below are a few guidelines to help tackle the problem better.


Guidelines to follow when uploading data into Amazon Cloudsearch.

  1. Group documents into batches before you upload them. Continuously uploading batches that consist of only one document has a huge, negative impact on the speed at which Amazon CloudSearch can process your updates. Instead, create batches that are as close to the limit as possible and upload them less frequently. (The limits are explained below)

  2. To upload data to your domain, it must be formatted as a valid JSON or XML batch


Now, let me explain the limitations associated with Amazon Cloud search related to file uploads.

1) Batch Size:

The maximum batch size is 5 MB

2) Document size

The maximum document size is 1 MB

3) Document fields

Documents can have no more than 200 fields

4) Data loading volume

You can load one document batch every 10 seconds (approximately 10,000 batches every 24 hours), with each batch size up to 5 MB.

But if you wish to increase the limits, you can Contact Amazon CloudSearch. At the moment, Amazon does not allow to increase upload size limitations.

You can submit a request if you need to increase the maximum number of partitions for a search domain. For information about increasing other limits such as the maximum number of search domains, contact Amazon CloudSearch.

Upvotes: 2

Related Questions