How to save csv files faster from pyspark dataframe?

Question

I am currently using pyspark on a local windows 10 system. The pyspark code runs quite fast but takes a lot of time to save the pyspark dataframe to a csv format.

I am converting the pyspark dataframe to pandas and then saving it to a csv file. I have also tried using the write method to save the csv file.

Full_data.toPandas().to_csv("Level 1 - {} Hourly Avg Data.csv".format(yr), index=False)




Full_data.repartition(1).write.format('com.databricks.spark.csv').option("header", "true").save("Level 1 - {} Hourly Avg Data.csv".format(yr))

Both codes took about an hour to save the csv files. Is there a faster way to save csv files from pyspark dataframe?

How to save csv files faster from pyspark dataframe?

Answers (1)

Related Questions