blackzero
blackzero

Reputation: 88

MarkLogic - S3 Import

Can we import data from Amazon S3 into MarkLogic using

  1. JavaScript/xQuery API
  2. MarkLogic Content Pump
  3. Any other way?

Please share the reference, if available.

Upvotes: 3

Views: 1027

Answers (4)

DALDEI
DALDEI

Reputation: 3732

If you configure your aws credententials in the admin tool, you can use a URL of the form "s3://bucket/key" to access S3 for read or write.

See EC2 guide See Stackoverflow similar question

Upvotes: 0

Amit Gope
Amit Gope

Reputation: 130

Recently I faced the same issue and I used the following MLCP code for copying data over, and it worked.

mlcp export -host {host} -port {port} -username {username} -password {password} -output_file_path {S3 path} -collection_filter {collection name to be moved}

Upvotes: 0

mg_kedzie
mg_kedzie

Reputation: 437

Load test.xml file from AWS S3 bucket into the database associated with your REST API instance using the /documents service:

curl https://s3.amazonaws.com/yourbucket/test.xml | curl -v --digest --user user:password -H "Content-Type: application/xml" -X PUT -d @- "localhost:8052/v1/documents?uri=/docs/test.xml"
  • replace https://s3.amazonaws.com/yourbucket/test.xml with valid URL of AWS S3 storage
  • replace user:password with valid values
  • replace localhost:8052 with URL of your MarkLogic app server

Upvotes: 0

Dave Cassel
Dave Cassel

Reputation: 8422

I'm not an AWS expert by any stretch, but I if you know the locations of data on S3, you can use xdmp:document-get(), with an http:// prefix in the $location, to retrieve documents. You can also use xdmp:http-get(), perhaps to query for the locations of your documents. Once that command has returned, you can use the usual xdmp:document-insert.

That approach should be fine for a small number of documents. If you have a large set you want to import, you'll have to factor in the possibility of the transaction timing out.

For a larger data set, you might want to manage the process externally. Here are a couple options:

  • export data from S3 onto your local filesystem, then use MLCP to send it to MarkLogic
  • insert a document that has a list of resources at S3 that you want to import; spawn tasks that will each take a group of those resources and import them using xdmp:document-get()
  • use Java code to pull a document (or batch of documents) from S3, then use the Java Client API to insert that data into MarkLogic
  • once MarkLogic 9 comes out, use the Data Movement SDK, which is intended to make projects like this easier (as of this writing, the DMSDK is still in development)

Upvotes: 3

Related Questions