Kaleigh Spitzer
Kaleigh Spitzer

Reputation: 21

Delta Live Tables saving as corrupt files

I'm currently implementing an ETL pipeline using Databricks Delta Live Tables. I specified the storage location as a folder in ADLS. When I run the pipeline and look at the files, the .snappy.parquet files that are getting saved to ADLS have unicode characters in them. I am using very small (around 5 rows each) csv files that don't have any null values or special characters. Has anyone ran into this issue / does anyone know how to solve this?

What I've tried:

Upvotes: 1

Views: 196

Answers (1)

Bhavani
Bhavani

Reputation: 5317

When I tried to view the Delta table, I encountered the same issue as shown below:

enter image description here

The data has Unicode solutions. According to this, the "Underlying Data" of a "Delta Table" is "Stored" in the "Compressed Parquet File Format," i.e., in "snappy. Parquet" File Format.

As per this, Parquet is a binary-based (rather than text-based) file format optimized for computers, so Parquet files aren't directly readable by humans. That may be the reason for getting data with Unicode as above. So, if we want to view the data of a snappy. parquet file, read it in Databricks using the code below:

df = spark.read.format("delta").load("<deltaTablePath>")
df.show()

Then we can view the data of the Delta table as shown below:

enter image description here

Alternatively, read the file using Parquet reading tools or upload it to an online Parquet viewer as shown below:

enter image description here

Upvotes: 0

Related Questions