Reputation: 3099
I want to save Spark DataFrame in Delta format to S3, however, for some reason, the data is not saved. I debugged all the processing steps there was data and right before saving it, I ran count on the DataFrame which returned 24 rows. But as soon as save is called no data appears in the resulting folder. What could be the reason for it?
This is how I save the data:
df
.select(schema)
.repartition(partitionKeys.map(new ColumnName(_)): _*)
.sortWithinPartitions(sortByKeys.map(new ColumnName(_)): _*)
.write
.format("delta")
.partitionBy(partitionKeys: _*)
.mode(saveMode)
.save("s3a://etl-qa/data_feed")
Upvotes: 0
Views: 1271
Reputation: 18475
There is a quick start from Databricks that explains how to read and write from and to a delta lake.
If the Dataframe you are trying to save is called df
you need to execute:
df.write.format("delta").save(s3path)
Upvotes: 1