Dev
Dev

Reputation: 13753

How to save data in mongo collection using spark with mongo-hadoop connector?

I followed mongo-hadoop connector's documentation.

I am able to transfer data from inputCol collection to outputCol collection in testDB database using:

 Configuration mongodbConfig = new Configuration();
 mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");

 mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/testDB.inputCol");

 JavaSparkContext sc = new JavaSparkContext(sparkClient.sparkContext);

 JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
                mongodbConfig,            // Configuration
                MongoInputFormat.class,   // InputFormat: read from a live cluster.
                Object.class,             // Key class
                BSONObject.class          // Value class
            );


 Configuration outputConfig = new Configuration();
 outputConfig.set("mongo.output.format",
                         "com.mongodb.hadoop.MongoOutputFormat");
 outputConfig.set("mongo.output.uri",
                         "mongodb://localhost:27017/testDB.outputCol");

 documents.saveAsNewAPIHadoopFile(
                "file:///this-is-completely-unused",
                Object.class,
                BSONObject.class,
                MongoOutputFormat.class,
                outputConfig
            );

I want to save a simple document say

{"_id":1, "name":"dev"}

in outputCol collection in testDB database.

How can I achieve that?

Upvotes: 2

Views: 2113

Answers (2)

Hunter Lin
Hunter Lin

Reputation: 51

It's the same, just put your BsonObject into a RDD[(Object,BsonObject)] (that Object can be anything, null should be fine) and save it as you did for documents

Upvotes: 0

Ajay Gupta
Ajay Gupta

Reputation: 3212

For using query in Spark MongoDB Hadoop Connector you can use:

mongodbConfig.set("mongo.input.query","{'_id':1,'name':'dev'}")

Upvotes: 1

Related Questions