Reputation: 4266
Similar to my question here but this time it's Java, not Python, causing me problems.
I have followed the steps advised (to the best of my knowledge) here but since I'm using hadoop-2.6.1 I think I should be using the old API, rather than the new API referred to in the example.
I'm working on Ubuntu and the various component versions I have are
My Java program is basic
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import com.mongodb.hadoop.MongoInputFormat;
import org.apache.hadoop.conf.Configuration;
import org.bson.BSONObject;
public class SimpleApp {
public static void main(String[] args) {
Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/db.collection");
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
mongodbConfig, // Configuration
MongoInputFormat.class, // InputFormat: read from a live cluster.
Object.class, // Key class
BSONObject.class // Value class
);
}
}
It is building fine using Maven (mvn package
) with the following pom file
<project>
<groupId>edu.berkeley</groupId>
<artifactId>simple-project</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Simple Project</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
I then submit the jar
/usr/local/share/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" --master local[4] target/simple-project-1.0.jar
and get the following error
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
at SimpleApp.main(SimpleApp.java:18)
I edited this question on the 18th December as it had grown far too confusing and verbose. Previous comments might look irrelevant. The context of the question, however, is the same.
Upvotes: 5
Views: 3826
Reputation: 628
I faced same problems but after lot of trials& changes, I got my work done with this code. I'm running Maven project with netbeans on ubuntu & Java 7 Hope this helps.
Include maven-shade-plugin
if there are any conflicts b/w classes
P.S: I don't know about your particular error, but faced such a plenty. and this code is running perfectly .
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.14</version>
</dependency>
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.1</version>
</dependency>
</dependencies>
Java code
Configuration conf = new Configuration();
conf.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
conf.set("mongo.input.uri", "mongodb://localhost:27017/databasename.collectionname");
SparkConf sconf = new SparkConf().setMaster("local").setAppName("Spark UM Jar");
JavaRDD<User> UserMaster = sc.newAPIHadoopRDD(conf, MongoInputFormat.class, Object.class, BSONObject.class)
.map(new Function<Tuple2<Object, BSONObject>, User>() {
@Override
public User call(Tuple2<Object, BSONObject> v1) throws Exception {
//return User
}
}
Upvotes: 3