Reputation: 357
Since version 2.0, Apache Spark is bundled with a folder "jars" full of .jar files. Obviously Maven will download all these jars when issuing:
mvn -e package
because in order to submit an application with
spark-submit --class DataFetch target/DataFetch-1.0-SNAPSHOT.jar
the DataFetch-1.0-SNAPSHOT.jar is needed.
So, the first question is straightforward, how can I take advantage of these existing jars?. The second question is related, son I've tried the first time with Maven downloading the jars, I've got the following output:
[INFO] Error stacktraces are turned on.
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building "DataFetch" 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @DataFetch ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /root/sparkTests/scalaScripts/DataFetch/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.0:compile (default-compile) @ DataFetch - --
[INFO] No sources to compile
[INFO]
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ DataFetch ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory/root/sparkTests/scalaScripts/DataFetch/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.0:testCompile (default-testCompile) @ DataFetch ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ DataFetch ---
[INFO] No tests to run.
[INFO] Surefire report directory: /root/sparkTests/scalaScripts/DataFetch/target/surefire-reports
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ DataFetch---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.294s
[INFO] Finished at: Wed Sep 28 17:41:29 PYT 2016
[INFO] Final Memory: 14M/71M
[INFO] ------------------------------------------------------------------------
And here is my pom.xml file
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.spark.pg</groupId>
<artifactId>DataFetch</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>"DataFetch"</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
</plugin>
</plugins>
</build>
</project>
If more information is needed, please don't hesitate to ask for it.
Upvotes: 1
Views: 2757
Reputation: 5259
I am not sure whether I understand your problem, but I try to answer.
Based on Spark Bundling Your Application’s Dependencies documentation:
When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime.
You can set scope to provided in maven pom.xml file
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<!-- add this scope -->
<scope>provided</scope>
</dependency>
The second think I noticed is that maven build creates empty JAR.
[WARNING] JAR will be empty - no content was marked for inclusion!
If you have any other dependencies, you should package these dependencies into final jar archive file.
You can do something like below in pom.xml and run mvn package:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.6</version>
<configuration>
<!-- package with project dependencies -->
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>YOUR_MAIN_CLASS</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
Maven log should print line with building jar:
[INFO] --- maven-assembly-plugin:2.4.1:single (make-assembly) @ dateUtils ---
[INFO] Building jar: path/target/APPLICATION_NAME-jar-with-dependencies.jar
After maven packaging phase in the target folder you should see DataFetch-1.0-SNAPSHOTjar-with-dependencies.jar and you can sumbit this jar with spark-submit
Upvotes: 1