tomek.xyz
tomek.xyz

Reputation: 139

Understanding how spark applications use dependencies

Let's say that we have spark application that write/read to/from HDFS and we have some additional dependency, let's call it dep.

Now, let's do spark-submit on our jar built with sbt. I know that spark-submit send some jars (known as spark-libs). However, my questions are:
(1) How does version of spark influence on sent dependencies? I mean a difference between spark-with-hadoop/bin/spark-submit and spark-without-hadopo/bin/spark-submit?
(2) How does version of hadoop installed on cluster (hadoop cluster) influence on dependencies?
(3) Who is responsible for providing my dependency dep? Should I build fat-jar (assembly) ?

Please note that both first questions are about from what HDFS calls come from (I mean calls done by my spark application like write/read).

Thanks in advance

Upvotes: 0

Views: 250

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191738

spark-without-hadoop refers only to the downloaded package, not application development.

The more correct phrasing is "Bring your own Hadoop," meaning you still are required to have the base Hadoop dependencies for any Spark application.

Should I build fat-jar (assembly) ?

If you have libraries that are outside of hadoop-client and those provided by Spark (core, mllib, streaming), then yes

Upvotes: 0

Related Questions