Reputation: 31
I am trying to cache a Pyspark based data frame with 3 columns and 27 rows and this process is taking around 7-10 seconds.
Is there anyway to accelerate this job?
Thanks in advance!
Upvotes: 0
Views: 139
Reputation: 349
You could try any of the below approaches:
df.coalesce(1)
and then cache itspark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
Upvotes: 2