Reputation: 13753
I followed mongo-hadoop connector's documentation.
I am able to transfer data from inputCol
collection to outputCol
collection in testDB
database using:
Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/testDB.inputCol");
JavaSparkContext sc = new JavaSparkContext(sparkClient.sparkContext);
JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
mongodbConfig, // Configuration
MongoInputFormat.class, // InputFormat: read from a live cluster.
Object.class, // Key class
BSONObject.class // Value class
);
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.format",
"com.mongodb.hadoop.MongoOutputFormat");
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/testDB.outputCol");
documents.saveAsNewAPIHadoopFile(
"file:///this-is-completely-unused",
Object.class,
BSONObject.class,
MongoOutputFormat.class,
outputConfig
);
I want to save a simple document say
{"_id":1, "name":"dev"}
in outputCol
collection in testDB
database.
How can I achieve that?
Upvotes: 2
Views: 2113
Reputation: 51
It's the same, just put your BsonObject
into a RDD[(Object,BsonObject)]
(that Object can be anything, null should be fine) and save it as you did for documents
Upvotes: 0
Reputation: 3212
For using query in Spark MongoDB Hadoop Connector you can use:
mongodbConfig.set("mongo.input.query","{'_id':1,'name':'dev'}")
Upvotes: 1