Can spark dataframe (scala) be converted to dataframe in pandas (python)

Question

The Dataframe is created using scala api for SPARK

val someDF = spark.createDataFrame( spark.sparkContext.parallelize(someData), StructType(someSchema) )

I want to convert this to Pandas Dataframe

PySpark provides .toPandas() to convert a spark dataframe to pandas but there is no equivalent for scala(that I can find)

Please help me in this regard.

gsthina · Accepted Answer

To convert a Spark DataFrame into a Pandas DataFrame, you can enable spark.sql.execution.arrow.enabled to true and then read/create a DataFrame using Spark and then convert it to Pandas DataFrame using Arrow

Enable spark.conf.set("spark.sql.execution.arrow.enabled", "true")
Create DataFrame using Spark like you did:

    val someDF = spark.createDataFrame()

Convert the same to a pandas DataFrame

result_pdf = someDF.select("*").toPandas()

The above commands run using Arrow, because of the config spark.sql.execution.arrow.enabled set to true

Hope this helps!

Can spark dataframe (scala) be converted to dataframe in pandas (python)

Answers (2)

Related Questions