Reputation: 17676
I have a Scala library which contains some utility codes and UDF for the Scala Spark API.
However, I would love to now start to use this Scala library with PySpark. Using Java based classes seems to work pretty OK like outlined Running custom Java class in PySpark, however as I use a library written in Scala some the names of some classes might not be straight forward and contain characters like $
.
How is interoperability still possible?
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
Upvotes: 0
Views: 853
Reputation: 4631
In general you don't. While access in such cases is sometimes possible, using __getattribute__
/ getattr
, Py4j is simply not designed with Scala in mind (that's really not Python specific - while Scala is technically speaking interpolatable with Java, it is much richer language, and many of its features are not easily accessible from other JVM languages).
In practice you should do the same thing that Spark does internally - instead of exposing Scala API directly, you create a lean* Java or Scala API, which is specifically designed for interoperability with guest languages. Since Py4j provides translation only between basic Python and Java types, and doesn't handle commonly used Scala interfaces, you will need such intermediate layer anyway, unless Scala library was specifically designed for Java interoperability.
As of your last concern
How can I use Java/Scala code which is offering a function requiring a generic type parameter?
Py4j can handle Java generics just fine without any special treatment. Advanced Scala features (manifests, class tags, type tags) are typically no go, but once again, there are not designed (though it is possible) with Java interoperability in mind.
* As a rule of thumb, if something is Java friendly (doesn't require any crazy hacks, extensive type conversions, or filling the blanks normally handled by the Scala compiler), it should be a good fit for PySpark as well.
Upvotes: 2