mkach
mkach

Reputation: 1

Mongo-Hadoop streaming

I'm new to Mongodb and Hadoop. I'm trying to access mongodb data as input to hadoop mapreduce job. i don't quite know how to specify which collection to use to get data from. this is what i tried:

hadoop jar/usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar 
-input user/test/input/
-output user/test/output/
-inputformat com.mongodb.hadoop.mapred.MongoInputFormat
-outputformat com.mongodb.hadoop.mapred.MongoOutputFormat
-io mongodb
-D mongo.input.uri=mongodb://localhost/my_dbs.collectionName 
-D stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.MongoIdentifierResolver 
-mapper /Users/wordcountMapper.py 
-reducer /Users/wordcountReducer.py 
-libjars /usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar

But i get the following error:

ERROR streaming.StreamJob: Unrecognized option: -D
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]

and when i tried this, i get another error:

 hadoop jar /usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar 
-input user/input/ 
-output user/test/output 
-inputformat com.mongodb.hadoop.mapred.MongoInputFormat 
-outputformat com.mongodb.hadoop.mapred.MongoOutputFormat 
-io mongodb -jobconf mongo.input.uri=mongodb://localhost/my_dbs.collectionName 
-jobconf stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.MongoIdentifierResolver 
-mapper /Users/wordcountMapper.py 
-reducer /Users/wordcountReducer.py 
-libjars /usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/mongo-hadoop-streaming.jar

`ERROR streaming.StreamJob: Unrecognized option: -libjars
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]`

Please help.

Upvotes: 0

Views: 506

Answers (1)

Alekhya Vemavarapu
Alekhya Vemavarapu

Reputation: 1155

Please check this link for better idea on how to connect MongoDB to Hadoop.

Edit:

or,

Instead of giving jar with -libjars option, you can directly write it in your driver program as:

args.add("-libjars");
args.add("/some/path/to/your/jar");

Upvotes: 1

Related Questions