Reputation: 11
I'm executing this simple code in a pig script:
REGISTER /home/myuser/mongodb/mongo-2.10.1.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-pig_cdh4.3.0-1.2.0.jar
set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;
col = LOAD 'mongodb://localhost:27017/mydb.mycollection' using com.mongodb.hadoop.pig.MongoLoader ('id:chararray, companyId:chararray, ts:chararray', 'id');
STORE col INTO 'mongodb://localhost:27017/mydb.mycollection2' USING com.mongodb.hadoop.pig.MongoInsertStorage ('', '');
it returns the following error:
Location Config: Configuration: For URI: file:/tmp/temp449583595/tmp-109467318 2014-04-04 14:30:40,913 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration. Details at logfile: /home/myuser/pig/pig_1396614639609.log
the end of file pig_1396614639609.log:
... at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.IllegalArgumentException: Invalid URI Format. URIs must begin with a mongodb:// protocol string. at com.mongodb.hadoop.pig.MongoInsertStorage.setStoreLocation(MongoInsertStorage.java:159) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:576)
... 17 more
I don't know where is the error so that mongodb protocol string "mongodb://" is well-written.
Upvotes: 0
Views: 318
Reputation: 432
I have a similar issue when running LOAD and STORE using mongo-hadoop on the same Pig script.
It throws
java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I didn't investigate further, but either is a bug or some parameter related to locking. I don't know.
If I run the same code, but loading and storing in different scripts it runs without a problem.
Upvotes: 0