Reputation: 31
I am using Cloudera 4.2.0 and Spark.
I just want to try out some examples given by Spark.
// HdfsTest.scala
package spark.examples
import spark._
object HdfsTest {
def main(args: Array[String]) {
val sc = new SparkContext(args(0), "HdfsTest",
System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR")))
val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv")
val mapped = file.map(s => s.length).cache()
for (iter <- 1 to 10) {
val start = System.currentTimeMillis()
for (x <- mapped) { x + 2 }
// println("Processing: " + x)
val end = System.currentTimeMillis()
println("Iteration " + iter + " took " + (end-start) + " ms")
}
System.exit(0)
}
}
It's ok for compiling, but there is always some runtime problems:
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2229)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2240)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2296)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
at spark.SparkContext.hadoopFile(SparkContext.scala:263)
at spark.SparkContext.textFile(SparkContext.scala:235)
at spark.examples.HdfsTest$.main(HdfsTest.scala:9)
at spark.examples.HdfsTest.main(HdfsTest.scala)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
at org.apache.hadoop.hdfs.HftpFileSystem.<clinit>(HftpFileSystem.java:84)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at java.lang.Class.newInstance0(Class.java:374)
at java.lang.Class.newInstance(Class.java:327)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 16 more
I have searched on Google, no idea about this kind of exception for Spark and HDFS.
val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv")
is where the problem occurs.
13/04/04 12:20:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
And I got this Warning. Maybe I should add some hadoop paths in CLASS_PATH.
Feel free to give any clue. =)
Thank you all.
REN Hao
Upvotes: 3
Views: 10358
Reputation: 684
You can set Coudera's Hadoop version with an environment variable when building Spark, look up your exact artifact version on Cloudera's maven repo, should be this:
SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 sbt/sbt assembly publish-local
Make sure you run whatever you run with the same Java engine you use to build Spark. Also, there are pre-built Spark packages for different Cloudera Hadoop distributions, like http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz
Upvotes: 0
Reputation: 13821
(This question was also asked / answered on the spark-users mailing list).
You need to compile Spark against the particular version of Hadoop/HDFS running on your cluster. From the Spark documentation:
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs. You can change the version by setting the
HADOOP_VERSION
variable at the top ofproject/SparkBuild.scala
, then rebuilding Spark (sbt/sbt clean compile
).
The spark-users mailing list archives contain several questions about compiling against specific Hadoop versions, so I would search there if you run into any problems when building Spark.
Upvotes: 4
Reputation: 1790
This might be a problem related with the installed Java in your system. Hadoop requires (Sun) Java 1.6+. Make sure you have:
JAVA_HOME="/usr/lib/jvm/java-6-sun
Upvotes: -1