Reputation:
I'm new to scala and I don't know how to ask this kind of question (technical word ...). I have a dataframe :
id VehicleID Longitude Latitude Date Distance
1 12311 55.55431 25.45631 01/02/2020 20
2 12311 55.55432 25.45634 01/02/2020 80
3 12311 55.55433 25.45637 02/02/2020 10
4 12311 55.55431 25.45621 02/02/2020 50
5 12309 55.55427 25.45627 01/02/2020 30
6 12309 55.55436 25.45655 02/02/2020 20
7 12412 55.55441 25.45657 01/02/2020 14
8 12412 55.55442 25.45656 02/02/2020 60
i want to save each partition by vehicle_id and date in a different file. For Exampe
Save tha on a xlsx file named : 12311_01/02/2020
id VehicleID Longitude Latitude Date Distance
1 12311 55.55431 25.45631 01/02/2020 20
2 12311 55.55432 25.45634 01/02/2020 80
Save tha on a xlsx file named : 12311_02/02/2020
id VehicleID Longitude Latitude Date Distance
3 12311 55.55433 25.45637 02/02/2020 10
4 12311 55.55431 25.45621 02/02/2020 50
Save tha on a xlsx file named : 12309_01/02/2020
id VehicleID Longitude Latitude Date Distance
5 12309 55.55427 25.45627 01/02/2020 30
I managed to save it in a single file, but I cannot save in several files according to partition
df.cache().write.format("com.crealytics.spark.excel").option("header",true).save("test.xlsx")
There is any posibility to do that on spark with scala ?
Thanks .
Upvotes: 1
Views: 2706
Reputation: 39930
In order to partition your dataframe on two columns, all you have to do is to call partitionBy()
in order create more partitions and finally save the file to csv
.
The following should do the trick for you:
df \
.write \
.partitionBy("VehicleID", "Date")
.format("com.crealytics.spark.excel") \
.option("header", "true") \
.save("test.xlsx") \
Alternatively, if you want the output to be in csv
format:
df \
.write \
.partitionBy("VehicleID", "Date")
.format("com.databricks.spark.csv") \
.option("header", "true") \
.save("test.csv") \
Upvotes: 2