Reputation: 97
The Dataframe is created using scala api for SPARK
val someDF = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(someSchema)
)
I want to convert this to Pandas Dataframe
PySpark provides .toPandas()
to convert a spark dataframe to pandas but there is no equivalent for scala
(that I can find)
Please help me in this regard.
Upvotes: 0
Views: 2318
Reputation: 1100
To convert a Spark DataFrame into a Pandas DataFrame, you can enable spark.sql.execution.arrow.enabled
to true
and then read/create a DataFrame using Spark and then convert it to Pandas DataFrame using Arrow
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
val someDF = spark.createDataFrame()
result_pdf = someDF.select("*").toPandas()
The above commands run using Arrow, because of the config spark.sql.execution.arrow.enabled
set to true
Hope this helps!
Upvotes: 2
Reputation: 4481
In Spark DataFrame
is just abstraction above data, most common sources of data are files from file system. When you convert dataframe in PySpark to Pandas format, PySpark just convert PySpark abstraction above data to another abstraction from another python framework. If you want made conversion in Scala between Spark and Pandas you can't do that because Pandas is Python library for work with data but spark is not and you will have some difficulties with Python and Scala integration. The best simple things you can do here:
Upvotes: 1