Reputation: 12294
I've built a jar that I can use from pyspark by adding it to ${SPARK_HOME}/jars
and calling it using
spark._sc._jvm.com.mypackage.myclass.mymethod()
however what I'd like to do is bundle that jar into a python wheel so someone can pip install a the jar into their running pyspark/jupyter session. I'm not very familiar with python packaging, is it possible to distribute jars inside a wheel and have that jar be automatically available to pyspark?
I want to put a jar inside a wheel or egg (not even sure if I can do that???) and upon installation of said wheel/egg, out that jar in a place where it will be available to the jvm.
I guess what I'm really asking is, how do I make it easy for someone to install a 3rd party jar and use it from pyspark?
Upvotes: 2
Views: 213
Reputation: 1525
As you have mentioned above, and hope you have already used the --jars option and able to use function in pyspark. As understood your requirement correctly you want to add this jar in install package so that jar library will be available on each node of cluster.
There is one source found on databricks which talks about adding third party jar files pyspark python wheel install. See if that is only information you are looking at.
https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel
Upvotes: 1