NateH06
NateH06

Reputation: 3594

SBT project java.io.FileNotFoundException:FileNotFoundException: HADOOP_HOME unset

I'm trying to use an AvroParquetWriter to convert a file in Avro format to a parquet file. I load up the schema

val schema:org.apache.Schema = ... getSchema(...)
val parquetFile = new Path("Location/for/parquetFile.txt")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,schema)

My code runs fine up until it gets to initializing the AvroParquetWriter. Then it throws this error:

> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
> unset. -see https://wiki.apache.org/hadoop/WindowsProblems    at
> org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:722)  at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:256)
>   at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:273)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:235)...etc

The advice it seems to give, and the advice I'm finding, is related to how to fix this if you are running a Hadoop cluster on your machine. However, I am not running a Hadoop cluster, nor am I aiming to. I have imported some of its libraries to use with various other pieces of my program in my SBT file, but this does not spin up a local cluster.

It just started doing this. Out of my 2 other colleagues, one is able to run this without issue, and the other just started getting the same issue as me. Here is (the relevant parts of) my build.sbt:

lazy val root = (project in file("."))
  .settings(
    commonSettings,
    name := "My project",
    version := "0.1",
        libraryDependencies ++= Seq(
          "org.apache.hadoop" % "hadoop-common" % "2.9.0",
      "com.typesafe.akka" %% "akka-actor" % "2.5.2",
      "com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.9",
      "com.enragedginger" % "akka-quartz-scheduler_2.12" % "1.6.0-akka-2.4.x",
      "com.typesafe.akka" % "akka-agent_2.12" % "2.5.2",
      "com.typesafe.akka" % "akka-remote_2.12" % "2.5.2",
      "com.typesafe.akka" % "akka-stream_2.12" % "2.5.2",
      "org.apache.kafka" % "kafka-clients" % "0.10.2.1",
      "com.typesafe.akka" %% "akka-stream-kafka" % "0.16",
      "com.typesafe.akka" %% "akka-persistence" % "2.5.2",
      "org.iq80.leveldb"            % "leveldb" % "0.7",
      "org.fusesource.leveldbjni"   % "leveldbjni-all"   % "1.8",
      "javax.mail" % "javax.mail-api" % "1.5.6",
      "com.sun.mail" % "javax.mail" % "1.5.6",
      "commons-io" % "commons-io" % "2.5",
      "org.apache.avro" % "avro" % "1.8.1",
      "net.liftweb" % "lift-json_2.12" % "3.1.0-M1",
      "com.google.code.gson" % "gson" % "2.8.1",
      "org.json4s" %% "json4s-jackson" % "3.5.2",
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.149",
          //"com.amazonaws" % "aws-java-sdk" % "1.11.286",
      "org.scalikejdbc" %% "scalikejdbc"         % "3.0.0",
      "org.scalikejdbc" %% "scalikejdbc-config"  % "3.0.0",
      "org.scalikejdbc" % "scalikejdbc-interpolation_2.12" % "3.0.2",
      "com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8",
      "org.apache.commons" % "commons-pool2" % "2.4.2",
      "commons-pool" % "commons-pool" % "1.6",
      "com.jcraft" % "jsch" % "0.1.54",
      "ch.qos.logback" % "logback-classic" % "1.2.3",
      "com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
      "org.scalactic" %% "scalactic" % "3.0.4",
          "mysql" % "mysql-connector-java" % "8.0.8-dmr",
      "org.scalatest" %% "scalatest" % "3.0.4" % "test"
        )
  )

Any ideas as to why it cannot run the Hadoop-related dependencies?

Upvotes: 1

Views: 5567

Answers (1)

NateH06
NateH06

Reputation: 3594

The answer was to follow their suggestion-

  1. I downloaded the latest version of the winutils.exe from https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin

  2. Then I manually created this directory structure in C:/Users/MyName/Hadoop/bin - note, the bin MUST be there. You can actually call the Hadoop/ directory whatever you want, but the bin/ must be one level within.

  3. I placed the winutils.exe and placed it in the bin.

  4. In my code I had to put this line above initializing the parquet writer (I'd imagine it can be anytime before it is initialized) to set the Hadoop line:

-

System.setProperty("hadoop.home.dir", "C:/Users/nhanak/Hadoop/")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,iInfo.schema)
  1. Optional - if you want to just keep this within your project and not have it carry over to your local machine, or if others are going to be pulling this repo or you want to pack it in a jar to send off everywhere, etc. - create a directory structure within your project and store the winutils.exe inside of it. -so, say you create the directory structure src/main/resources/HadoopResources/bin in your project, place the winutils.exe in the bin. Then, to make use of the winutils.exe you need to set the Hadoop home like this:

-

 val file = new File("src/main/resources/HadoopResources")
      System.setProperty("hadoop.home.dir", file.getAbsolutePath)

Upvotes: 3

Related Questions