Patterson
Patterson

Reputation: 2821

Unable to read Databricks Delta / Parquet File with Delta Format

I am trying to read a delta / parquet in Databricks using the follow code in Databricks

df3 = spark.read.format("delta").load('/mnt/lake/CUR/CURATED/origination/company/opportunities_final/curorigination.presentation.parquet')

However, I'm getting the following error:

A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: curorigination.presentation.parquet

This seemed very straightforward, but not sure why I'm getting the error

Any thoughts?

The file structure looks like the following enter image description here

Upvotes: 1

Views: 10052

Answers (2)

Jonathan
Jonathan

Reputation: 2043

The error shows that delta lake thinks that you have wrong partition path naming.

If you have any partition column in your delta table, for example year month day, your path should look like

/mnt/lake/CUR/CURATED/origination/company/opportunities_final/year=yyyy/month=mm/day=dd/curorigination.presentation.parquet

and, you just need to do

df = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

If you just read it as parquet, you can just do

df = spark.read.parquet("/mnt/lake/CUR/CURATED/origination/company/opportunities_final")

because you don't need to read the absolute path of the parquet file.

Upvotes: 3

Vamsi Bitra
Vamsi Bitra

Reputation: 2764

The above error mainly happens because of incorrect path format curorigination.presentation.parquet. please check your delta location and also check whether delta file is created or not :

%fs ls /mnt/lake/CUR/CURATED/origination/company/opportunities_final/  

I reproduced the same thing in my environment. First of all, I created a data frame with a parquet file.

df1 = spark.read.format("parquet").load("/FileStore/tables/")
display(df1)

Ref1

After that I just converted the parquet file into delta format and saved the file into this location/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1 .

df1.coalesce(1).write.format('delta').mode("overwrite").save("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta1")

Ref2

#Reading delta file
df3 = spark.read.format("delta").load("/mnt/lake/CUR/CURATED/origination/company/opportunities_final/demo_delta")
display(df3)

Ref4

Upvotes: 0

Related Questions