Can we import data from Amazon S3 into MarkLogic using JavaScript/xQuery API MarkLogic Content Pump Any other way? Please share the reference, if available.

amazon-s3marklogicmlcp

blackzero

Reputation: 88

MarkLogic - S3 Import

Can we import data from Amazon S3 into MarkLogic using

JavaScript/xQuery API
MarkLogic Content Pump
Any other way?

Please share the reference, if available.

Upvotes: 3

Answers (4)

DALDEI

Reputation: 3732

If you configure your aws credententials in the admin tool, you can use a URL of the form "s3://bucket/key" to access S3 for read or write.

See EC2 guide See Stackoverflow similar question

Upvotes: 0

Amit Gope

Reputation: 130

Recently I faced the same issue and I used the following MLCP code for copying data over, and it worked.

mlcp export -host {host} -port {port} -username {username} -password {password} -output_file_path {S3 path} -collection_filter {collection name to be moved}

Upvotes: 0

mg_kedzie

Reputation: 437

Load test.xml file from AWS S3 bucket into the database associated with your REST API instance using the /documents service:

curl https://s3.amazonaws.com/yourbucket/test.xml | curl -v --digest --user user:password -H "Content-Type: application/xml" -X PUT -d @- "localhost:8052/v1/documents?uri=/docs/test.xml"

replace https://s3.amazonaws.com/yourbucket/test.xml with valid URL of AWS S3 storage
replace user:password with valid values
replace localhost:8052 with URL of your MarkLogic app server

Upvotes: 0

Dave Cassel

Reputation: 8422

I'm not an AWS expert by any stretch, but I if you know the locations of data on S3, you can use xdmp:document-get(), with an http:// prefix in the $location, to retrieve documents. You can also use xdmp:http-get(), perhaps to query for the locations of your documents. Once that command has returned, you can use the usual xdmp:document-insert.

That approach should be fine for a small number of documents. If you have a large set you want to import, you'll have to factor in the possibility of the transaction timing out.

For a larger data set, you might want to manage the process externally. Here are a couple options:

export data from S3 onto your local filesystem, then use MLCP to send it to MarkLogic
insert a document that has a list of resources at S3 that you want to import; spawn tasks that will each take a group of those resources and import them using xdmp:document-get()
use Java code to pull a document (or batch of documents) from S3, then use the Java Client API to insert that data into MarkLogic
once MarkLogic 9 comes out, use the Data Movement SDK, which is intended to make projects like this easier (as of this writing, the DMSDK is still in development)

Upvotes: 3

MarkLogic - S3 Import

Answers (4)

Related Questions