Cassie
Cassie

Reputation: 3099

Spark DataFrame is not saved in Delta format

I want to save Spark DataFrame in Delta format to S3, however, for some reason, the data is not saved. I debugged all the processing steps there was data and right before saving it, I ran count on the DataFrame which returned 24 rows. But as soon as save is called no data appears in the resulting folder. What could be the reason for it?

This is how I save the data:

df
  .select(schema)
  .repartition(partitionKeys.map(new ColumnName(_)): _*)
  .sortWithinPartitions(sortByKeys.map(new ColumnName(_)): _*)
  .write
  .format("delta")
  .partitionBy(partitionKeys: _*)
  .mode(saveMode)
  .save("s3a://etl-qa/data_feed")

Upvotes: 0

Views: 1271

Answers (1)

Michael Heil
Michael Heil

Reputation: 18475

There is a quick start from Databricks that explains how to read and write from and to a delta lake.

If the Dataframe you are trying to save is called df you need to execute:

df.write.format("delta").save(s3path)

Upvotes: 1

Related Questions