Reputation: 1346
I'm having difficulty referencing a Delta table to perform an upsert/merge on it after creating it new. Doing it via pySpark with a typical dataframe.write.format("delta")
terminology works fine. When manually creating a table with the Delta table builder API create syntax,
deltaTable = DeltaTable.createIfNotExists(spark)
.location("/path/to/table")
.tableName("table")
.addColumn("id", dataType = "String")
...
.execute()
I can see the folder exists in storage as expected, and can verify that it's a Delta table using DeltaTable.isDeltaTable(spark, tablePath)
The problem I encounter is when running someTable = DeltaTable.forPath(spark, tablePath)
, then I get an error indicating that
pyspark.sql.utils.AnalysisException: A partition path fragment should be the form like 'part1=foo/part2=bar'
Whether I do or don't explicitly partition the table in the create statement doesn't seem to matter. I am trying to read the whole table, not a single partition.
So the question is, how do I reference the table correctly to load and manage it?
I'm using Azure Data Lake Gen 2 blob storage, though I'm not sure that's part of the issue.
As it's part of a question, my full path used for location is abfss://container_name@storage_account_name.dfs.core.windows.net/blobContainerName/delta/tables/nws, where nws has business meaning.
Upvotes: 0
Views: 1002