Reputation:
I want to use Panda's Transformations like Melt etc inside a Spark Application. I am using Scala for Spark, and I have to use some functionality like Melt from Pandas, is it possible to do that?
pd.melt() I have seen Pandas and PySpark going hand in hand in Notebooks.
Upvotes: 4
Views: 975
Reputation: 87299
(it's hard to provide example without more details, so this answer just includes links to documentation, etc.)
In recent versions of Spark there is support for so-called Pandas UDFs where you get Pandas series or dataframe as argument and return series or arguments, so you can execute Pandas functions to get results. Pandas UDFs are much faster than traditional Python UDFs because of the optimized data serialization, etc. See documentation and this blog post for more details.
Another alternative is to use Koalas - library for Spark that is re-implementing Pandas API but is doing it on Spark. There is an implementation of the melt
as well, but make sure to read documentation to understand possible differences in behavior.
Upvotes: 1