happybayes
happybayes

Reputation: 321

How to pass document uri & database name to marklogic spark connector?

I am trying this marklogic spark connector tutorial. https://developer.marklogic.com/blog/marklogic-spark-example I was able to execute this. What I found is, it picks the documents database by default.

Question is:

Given code looks like this:

JavaPairRDD<DocumentURI, MarkLogicNode> mlRDD = context.newAPIHadoopRDD( hdConf, Configuration DocumentInputFormat.class, InputFormat DocumentURI.class, Key Class MarkLogicNode.class, Value Class );

I was wondering how I can pass the specific Document URI and Database to just get a specific document in a database. For Example; Documents database with xml files created on importing a csv file. Mentioned below: Marklogic : Multiple XML files created on document on importing a csv. How to get root Document URI path? Can some one share a sample code on how to pass the document URI and database name as parameters?

Upvotes: 2

Views: 219

Answers (2)

Hemant Puranik
Hemant Puranik

Reputation: 11

If you refer to documentation for MarkLogic Connector for Hadoop, specifically Input Configuration Properties - You will find the property mapreduce.marklogic.input.documentselector which takes the XQuery path expression that allows you to select sepcific documents from the database.

Upvotes: 1

The sample uses The Hadoop Connector.

Using MarkLogic 8, I believe you can set the database like this: com.marklogic.output.databasename in the job configuration.

http://docs.marklogic.com/guide/mapreduce/quickstart#id_38329

Upvotes: 0

Related Questions