Reputation: 31

Using spark to access HDFS failed

I am using Cloudera 4.2.0 and Spark.

I just want to try out some examples given by Spark.

// HdfsTest.scala
package spark.examples

import spark._

object HdfsTest {
  def main(args: Array[String]) {
    val sc = new SparkContext(args(0), "HdfsTest",
      System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR")))

    val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv")
    val mapped = file.map(s => s.length).cache()
    for (iter <- 1 to 10) {
      val start = System.currentTimeMillis()
      for (x <- mapped) { x + 2 }
      //  println("Processing: " + x)
      val end = System.currentTimeMillis()
      println("Iteration " + iter + " took " + (end-start) + " ms")
    }
    System.exit(0)
  }
}

It's ok for compiling, but there is always some runtime problems:

Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
    at java.util.ServiceLoader.fail(ServiceLoader.java:224)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2229)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2240)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2296)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
    at spark.SparkContext.hadoopFile(SparkContext.scala:263)
    at spark.SparkContext.textFile(SparkContext.scala:235)
    at spark.examples.HdfsTest$.main(HdfsTest.scala:9)
    at spark.examples.HdfsTest.main(HdfsTest.scala)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
    at org.apache.hadoop.hdfs.HftpFileSystem.<clinit>(HftpFileSystem.java:84)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at java.lang.Class.newInstance0(Class.java:374)
    at java.lang.Class.newInstance(Class.java:327)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 16 more

I have searched on Google, no idea about this kind of exception for Spark and HDFS.

val file = sc.textFile("hdfs://n1.example.com/user/cloudera/data/navi_test.csv") is where the problem occurs.

13/04/04 12:20:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

And I got this Warning. Maybe I should add some hadoop paths in CLASS_PATH.

Feel free to give any clue. =)

Thank you all.

REN Hao

Upvotes: 3

Answers (3)

Utgarda

Reputation: 684

You can set Coudera's Hadoop version with an environment variable when building Spark, look up your exact artifact version on Cloudera's maven repo, should be this:

SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 sbt/sbt assembly publish-local

Make sure you run whatever you run with the same Java engine you use to build Spark. Also, there are pre-built Spark packages for different Cloudera Hadoop distributions, like http://spark-project.org/download/spark-0.8.0-incubating-bin-cdh4.tgz

Upvotes: 0

Josh Rosen

Reputation: 13821

(This question was also asked / answered on the spark-users mailing list).

You need to compile Spark against the particular version of Hadoop/HDFS running on your cluster. From the Spark documentation:

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs. You can change the version by setting the HADOOP_VERSION variable at the top of project/SparkBuild.scala, then rebuilding Spark (sbt/sbt clean compile).

The spark-users mailing list archives contain several questions about compiling against specific Hadoop versions, so I would search there if you run into any problems when building Spark.

Upvotes: 4

polerto

Reputation: 1790

This might be a problem related with the installed Java in your system. Hadoop requires (Sun) Java 1.6+. Make sure you have:

JAVA_HOME="/usr/lib/jvm/java-6-sun

Upvotes: -1

Using spark to access HDFS failed

Answers (3)

Related Questions