Reputation: 157
I don't have too much experience using maven and spark, but everything I did so far was in Scala. Now I have to develop a project in Pyspark and I was wondering if there is a possibility to create a project in Pyspark using maven, and if so how I would have to build the pom file.
Because so far in the pom I specified, for example, these properties:
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.assembly.plugin.version>3.1.0</maven.assembly.plugin.version>
<maven.antrun.plugin.version>1.8</maven.antrun.plugin.version>
<maven.surefire.plugin.version>3.0.0-M5</maven.surefire.plugin.version>
<maven.surefire.report.plugin.version>2.18.1</maven.surefire.report.plugin.version>
<maven.shade.plugin.version>3.1.1</maven.shade.plugin.version>
<maven.site.plugin.version>3.6</maven.site.plugin.version>
<maven.project.info.reports.plugin.version>2.2</maven.project.info.reports.plugin.version>
<scala.maven.plugin.version>4.1.1</scala.maven.plugin.version>
<maven.scalastyle.plugin.version>1.0.0</maven.scalastyle.plugin.version>
<encoding>UTF-8</encoding>
<scala.version>2.11.12</scala.version>
<spark.version>2.4.0.cloudera2</spark.version>
<hive-service.version>3.1.2</hive-service.version>
<spark.databricks.version>1.5.0</spark.databricks.version>
...
</properties>
Would it be in the same way only changing <scala.version>2.11.12</scala.version> by <python.version>3.6</python.version>? Or something like that?
Upvotes: 1
Views: 1306
Reputation: 1419
Languages supported by spark are
Spark submit command
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
You can explore the different language support from https://spark.apache.org/.
These different languages have their different build and deploy strategy
Eg: For java / scala - you can use Gradle or Maven for building, which will produce a jar file, which you can use to run on any machine which has java and spark setup.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
python - you can use pybuilder to build a zip file or can build an egg or can create a wheel distribution file, which can be used in submit spark command.
Simply pass a .py file in the place of , and add Python .zip, .egg or .py files to the search path with --py-files.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver and executor classpaths.
Upvotes: 1
Reputation: 183
To work on Pyspark project you need setup.py . You may refer packaging Python application. In the setup.py , you will list the dependency and to create artifact you can create a wheel file. Then the wheel file can be a part of spark submit
Upvotes: 0