Reputation: 3594
I'm trying to use an AvroParquetWriter to convert a file in Avro format to a parquet file. I load up the schema
val schema:org.apache.Schema = ... getSchema(...)
val parquetFile = new Path("Location/for/parquetFile.txt")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,schema)
My code runs fine up until it gets to initializing the AvroParquetWriter. Then it throws this error:
> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
> unset. -see https://wiki.apache.org/hadoop/WindowsProblems at
> org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:722) at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:256)
> at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:273)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:235)...etc
The advice it seems to give, and the advice I'm finding, is related to how to fix this if you are running a Hadoop cluster on your machine. However, I am not running a Hadoop cluster, nor am I aiming to. I have imported some of its libraries to use with various other pieces of my program in my SBT file, but this does not spin up a local cluster.
It just started doing this. Out of my 2 other colleagues, one is able to run this without issue, and the other just started getting the same issue as me. Here is (the relevant parts of) my build.sbt:
lazy val root = (project in file("."))
.settings(
commonSettings,
name := "My project",
version := "0.1",
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.9.0",
"com.typesafe.akka" %% "akka-actor" % "2.5.2",
"com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.9",
"com.enragedginger" % "akka-quartz-scheduler_2.12" % "1.6.0-akka-2.4.x",
"com.typesafe.akka" % "akka-agent_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-remote_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-stream_2.12" % "2.5.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.1",
"com.typesafe.akka" %% "akka-stream-kafka" % "0.16",
"com.typesafe.akka" %% "akka-persistence" % "2.5.2",
"org.iq80.leveldb" % "leveldb" % "0.7",
"org.fusesource.leveldbjni" % "leveldbjni-all" % "1.8",
"javax.mail" % "javax.mail-api" % "1.5.6",
"com.sun.mail" % "javax.mail" % "1.5.6",
"commons-io" % "commons-io" % "2.5",
"org.apache.avro" % "avro" % "1.8.1",
"net.liftweb" % "lift-json_2.12" % "3.1.0-M1",
"com.google.code.gson" % "gson" % "2.8.1",
"org.json4s" %% "json4s-jackson" % "3.5.2",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.149",
//"com.amazonaws" % "aws-java-sdk" % "1.11.286",
"org.scalikejdbc" %% "scalikejdbc" % "3.0.0",
"org.scalikejdbc" %% "scalikejdbc-config" % "3.0.0",
"org.scalikejdbc" % "scalikejdbc-interpolation_2.12" % "3.0.2",
"com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8",
"org.apache.commons" % "commons-pool2" % "2.4.2",
"commons-pool" % "commons-pool" % "1.6",
"com.jcraft" % "jsch" % "0.1.54",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
"org.scalactic" %% "scalactic" % "3.0.4",
"mysql" % "mysql-connector-java" % "8.0.8-dmr",
"org.scalatest" %% "scalatest" % "3.0.4" % "test"
)
)
Any ideas as to why it cannot run the Hadoop-related dependencies?
Upvotes: 1
Views: 5567
Reputation: 3594
The answer was to follow their suggestion-
I downloaded the latest version of the winutils.exe from https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin
Then I manually created this directory structure in C:/Users/MyName/Hadoop/bin
- note, the bin
MUST be there. You can actually call the Hadoop
/ directory whatever you want, but the bin/
must be one level within.
I placed the winutils.exe and placed it in the bin.
In my code I had to put this line above initializing the parquet writer (I'd imagine it can be anytime before it is initialized) to set the Hadoop line:
-
System.setProperty("hadoop.home.dir", "C:/Users/nhanak/Hadoop/")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,iInfo.schema)
src/main/resources/HadoopResources/bin
in your project, place the winutils.exe in the bin. Then, to make use of the winutils.exe you need to set the Hadoop home like this:-
val file = new File("src/main/resources/HadoopResources")
System.setProperty("hadoop.home.dir", file.getAbsolutePath)
Upvotes: 3