Pig: STORE with MongoInsertStorage don't work

I'm executing this simple code in a pig script:

REGISTER /home/myuser/mongodb/mongo-2.10.1.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-core_cdh4.3.0-1.2.0.jar
REGISTER /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/mongo-hadoop-cdh4-1.2.0/mongo-hadoop-pig_cdh4.3.0-1.2.0.jar

set mapred.map.tasks.speculative.execution false;
set mapred.reduce.tasks.speculative.execution false;

col = LOAD 'mongodb://localhost:27017/mydb.mycollection' using com.mongodb.hadoop.pig.MongoLoader ('id:chararray, companyId:chararray, ts:chararray', 'id');

STORE col INTO 'mongodb://localhost:27017/mydb.mycollection2' USING com.mongodb.hadoop.pig.MongoInsertStorage ('', '');

it returns the following error:

Location Config: Configuration:  For URI: file:/tmp/temp449583595/tmp-109467318
2014-04-04 14:30:40,913 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /home/myuser/pig/pig_1396614639609.log

the end of file pig_1396614639609.log:

... at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.IllegalArgumentException: Invalid URI Format. URIs must begin with a mongodb:// protocol string. at com.mongodb.hadoop.pig.MongoInsertStorage.setStoreLocation(MongoInsertStorage.java:159) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:576)

... 17 more

I don't know where is the error so that mongodb protocol string "mongodb://" is well-written.

Upvotes: 0

Views: 318

Answers (1)

Arian Pasquali
Arian Pasquali

Reputation: 432

I have a similar issue when running LOAD and STORE using mongo-hadoop on the same Pig script.

It throws

java.net.UnknownHostException: localhost:27017 is not a valid Inet address
at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:587)
    at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734)
    at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3890)
    at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

I didn't investigate further, but either is a bug or some parameter related to locking. I don't know.

If I run the same code, but loading and storing in different scripts it runs without a problem.

Upvotes: 0

Related Questions