Reputation: 45
i'm trying to do some tests on Apache Spark (v1.3.0), i have a simple Java 8 class:
public class WordCount {
private JavaSparkContext ctx;
private String inputFile, outputFile;
public WordCount(String inputFile, String outputFile) {
this.inputFile = inputFile;
this.outputFile = outputFile;
// Initialize Spark Conf
ctx = new JavaSparkContext("local", "WordCount",
System.getenv("SPARK_HOME"), System.getenv("JARS"));
}
public static void main(String... args) {
String inputFile = "/home/workspace/spark/src/main/resources/inferno.txt";//args[0];
String outputFile = "/home/workspace/spark/src/main/resources/dv";//args[1];
WordCount wc = new WordCount(inputFile, outputFile);
wc.doWordCount();
wc.close();
}
public void doWordCount() {
long start = System.currentTimeMillis();
JavaRDD<String> inputRdd = ctx.textFile(inputFile);
JavaPairRDD<String, Integer> count = inputRdd.flatMapToPair((String s) -> {
List<Tuple2<String, Integer>> list = new ArrayList<>();
Arrays.asList(s.split(" ")).forEach(s1 -> list.add(new Tuple2<String, Integer>(s1, 1)));
return list;
}).reduceByKey((x, y) -> x + y);
Tuple2<String, Integer> max = count.max(new Tuple2Comparator());
System.out.println(max);
// count.saveAsTextFile(outputFile);
long end = System.currentTimeMillis();
System.out.println(String.format("Time in ms is: %d", end - start));
}
public void close() {
ctx.stop();
}
}
The Tuple2Comparator comparator class is:
public class Tuple2Comparator implements Comparator<Tuple2<String, Integer>>, Serializable {
private static final long serialVersionUID = 103955884403243585L;
@Override
public int compare(Tuple2<String, Integer> o1, Tuple2<String, Integer> o2) {
return o2._2() - o1._2();
}
}
When i run it get the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.api.java.JavaPairRDD.max(Ljava/util/Comparator;)Lscala/Tuple2;
at it.conker.spark.base.WordCount.doWordCount2(WordCount.java:69)
at it.conker.spark.base.WordCount.main(WordCount.java:41)
This is mi pom file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<groupId>it.conker.spark</groupId>
<artifactId>learning-spark-by-example</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Learning Spark by example</name>
<packaging>jar</packaging>
<version>0.0.1</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<properties>
<java.version>1.8</java.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
I run the class within eclipse. Can someone tell me where i was wrong?
Upvotes: 3
Views: 1204
Reputation: 3702
You should change the scope of dependency of spark make it compile scope and it will work fine f you try to run on local machine.
Upvotes: 0
Reputation: 13841
I believe that this is an occurrence of SPARK-3266, which will be fixed in all upcoming Spark maintenance releases (see my pull request).
One workaround, which I have not tested, would be to cast count
to a JavaRDDLike<Tuple2<String, Integer>, ?>
before calling max()
, since a similar workaround worked for someone else (see my comment on JIRA).
Upvotes: 4