Luis
Luis

Reputation: 159

Errors Resolving org.apache.hadoop dependences with SBT offline

I´m trying to freeze dependences for a spark project to be able to work offline (sbt could not download dependences any more). This is the process I followed:

  1. Create sbt project and compile it with internet connection
  2. Halt internet connectivity
  3. Verify that projects keeps on compiling
  4. Duplicate SBT project and delete TARGET folder
  5. Tell the Build.sbt file resolve the dependences from the /.ivy2/cache folder

This is the build.sbt:

name := "Test"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"

resolvers += Resolver.file("Frozen IVY2 Cache Dependences", file("/home/luis/.ivy2/cache")) (Resolver.ivyStylePatterns) ivys "/home/luis/.ivy2/cache/[organisation]/[module]/ivy-[revision].xml"  artifacts  "/home/luis/.ivy2/cache/[organisation]/[module]/[type]s/[module]-[revision].[type]"

In fact the process to arrive to this Build.sbt has been exactly the same that the one described here (and not answered):

Troubles with sbt compiling offline using org.apache.hadoop/* dependencies

I included the appropiate ivy style patterns to point to the right ivy-[revision].xml file.

When I compile, sbt is finding the right path to the .ivy2/cache "frozen" repository for every dependence, however I'm getting warnings and errors related with the parseing of file "ivy-[revision].xml.original" for this four dependences:

[warn]  Note: Unresolved dependencies path:
[warn]          org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-yarn-api:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-mapreduce-client-core:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0

Lets concentrate in one of those dependences because the warnings and errors are the same for all of them. Let´s say org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0

An example of the warnings parsing the file "ivy-[revision].xml.original" are:

[warn] xml parsing: ivy-2.2.0.xml.original:18:69: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
[warn] xml parsing: ivy-2.2.0.xml.original:19:11: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
[warn] xml parsing: ivy-2.2.0.xml.original:20:17: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
.......
.......

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-mapreduce-client-app;2.2.0: java.text.ParseException: [xml parsing: ivy-2.2.0.xml.original:18:69: cvc-elt.1: Cannot find the declaration of element 'project'. in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag project in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag parent in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag modelVersion in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag name in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag properties in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag applink.base in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag mr.basedir in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original

Resulting in the errors:

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-mapreduce-client-app;2.2.0: java.text.ParseException: [xml parsing: ivy-2.2.0.xml.original:18:69: cvc-elt.1: Cannot find the declaration of element 'project'. in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag project in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag parent in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag modelVersion in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag name in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag properties in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag applink.base in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag mr.basedir in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] ]

Just to clarify, the content of the file ivy-2.2.0.xml.original looks like this:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                      http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <parent>
    <artifactId>hadoop-yarn</artifactId>
    <groupId>org.apache.hadoop</groupId>
    <version>2.2.0</version>
  </parent>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-yarn-api</artifactId>
  <version>2.2.0</version>
  <name>hadoop-yarn-api</name>

  <properties>
    <!-- Needed for generating FindBugs warnings using parent pom -->
    <yarn.basedir>${project.parent.basedir}</yarn.basedir>
  </properties>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-maven-plugins</artifactId>
        <executions>
          <execution>
            <id>compile-protoc</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>protoc</goal>
            </goals>
            <configuration>
              <protocVersion>${protobuf.version}</protocVersion>
              <protocCommand>${protoc.path}</protocCommand>
              <imports>
                <param>${basedir}/../../../hadoop-common-project/hadoop-common/src/main/proto</param>
                <param>${basedir}/src/main/proto</param>
                <param>${basedir}/src/main/proto/server</param>
              </imports>
              <source>
                <directory>${basedir}/src/main/proto</directory>
                <includes>
                  <include>yarn_protos.proto</include>
                  <include>yarn_service_protos.proto</include>
                  <include>applicationmaster_protocol.proto</include>
                  <include>applicationclient_protocol.proto</include>
                  <include>containermanagement_protocol.proto</include>
                  <include>server/yarn_server_resourcemanager_service_protos.proto</include>
                  <include>server/resourcemanager_administration_protocol.proto</include>
                </includes>
              </source>
              <output>${project.build.directory}/generated-sources/java</output>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

And after all this introduction...... Those are my questions:

I´ll appreciate any help!

SBT version: 0.13.8

Thanks.

Upvotes: 3

Views: 3212

Answers (2)

Luis
Luis

Reputation: 159

I finally succeeded to compile/package/assembly with sbt OFFLINE with a subset of frozen libraries. To summarize the process I will rewrite a bit the description above.

Those are the steps to generate the issue:

  1. In a computer called ORIGIN, let's create a sbt project, with scala sources and compile it with internet connection
  2. Halt internet connectivity Verify that projects keeps on compiling
  3. Duplicate SBT project or copy to a different computer (DESTINATION) without internet connection
  4. Try to compile. It won’t work because sbt will try to download dependences online and DESTINATION is an OFFLINE computer.

Those are the steps to fix the issue:

  1. Assuming that we are copying the sbt project to new computer (DESTINATION) without internet connectivity. We have to make sure that our sbt versions and scala versions of DESTINATION are the same that in the version in ORIGIN. If sbt or scala version differ, then when running SBT in DESTINATION, sbt will try to download the proper versions resulting in errors due to the lack of connection.
  2. If SBT and scala versions are the same, then we have to copy from ORIGIN to DESTINATION the following folders:
    • ORIGIN: /home/userA/.ivy2
    • ORIGIN: /home/userA/.sbt/boot
  3. Make sure that your environment variables that point to SBT and SCALA are properly configured
  4. Use a build.sbt file like this one:

Build.sbt:

name := "ProjectNAME"
version := "1.1"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
libraryDependencies += "joda-time" % "joda-time" % "2.3" 
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.3.0" % "provided"

// Optional if you are using the assembly plugin
jarName in assembly := "ProjectoEclipseScala.jar"
// Optional to avoid that assembly includes scala
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
  1. I´m not sure if "provided" is compulsory because the joda dependency is being read properly
  2. BE AWARE that all the dependences that you can use in DESTINATION must have been downloaded previously in ORIGIN and copied to DESTINATION.
  3. I tried to compile a simple project that only uses Spark context (not the spark-sql). So it should be able to compile with the unique dependency:

    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"

    However, we tested that it DOES NOT COMPILE!!. SBT complains about "jackson" package. Maybe "jackson" package is deployed within the spark-sql dependecy... Whatever, including spark-sql makes the project compile/package/assembly.

FINAL COMMENT: If even following this procedure you don’t succeed to compile, there is a "manual" alternative.... I also succeed to work offline, WITHOUT SBT stand-alone compiler... using Eclipse for Scala. In eclipse you can select the dependences manually in a graphical interface, and you can chose all spark, hadoop, mapreduce... dependences manually. Once that Eclipse recognize these dependences it will compile your classes in the path: "workspace/eclipse_project_name/bin" folder. Then you can pick them up and package them manually into a jar (a MANIFEST may be needed, but I guess it´s not necessary). This jar can be spark-submitted to the cluster if all the dependences are already running in the cluster.

Upvotes: 0

Martin
Martin

Reputation: 506

i asked the unanswered question you reference in your post and i'm happy to announce that it has been answered a few days ago and the proposed solution worked for me.

Try to update to sbt 0.13.9-RC3 (follow the instruction at http://www.scala-sbt.org/release/tutorial/Manual-Installation.html and get the jar at https://dl.bintray.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.9-RC3/).

Best regards,

/Martin

Upvotes: 1

Related Questions