Reputation: 303
I first setup a delta live tables using Python as follow
@dlt.table
def transaction():
return (
spark
.readStream
.format("cloudFiles")
.schema(transaction_schema)
.option("cloudFiles.format", "parquet")
.load(path)
)
And I wrote the delta live table to target database test
{
"id": <id>,
"clusters": [
{
"label": "default",
"autoscale": {
"min_workers": 1,
"max_workers": 5
}
}
],
"development": true,
"continuous": false,
"edition": "core",
"photon": false,
"libraries": [
{
"notebook": {
"path": <path>
}
}
],
"name": "dev pipeline",
"storage": <storage>,
"target": "test"
}
Everything worked as expected in the first trial.
After a while, I noticed that I forgot to add a partition column to the table, so I dropped the table in test
by DROP TABLE test.transaction
, and updated the notebook to
@dlt.table(
partition_cols=["partition"],
)
def transaction():
return (
spark
.readStream
.format("cloudFiles")
.schema(transaction_schema)
.option("cloudFiles.format", "parquet")
.load(path)
.withColumn("partition", F.to_date("timestamp"))
)
However, when I ran the pipeline again, I got an error
org.apache.spark.sql.AnalysisException: Cannot change partition columns for table transaction.
Current:
Requested: partition
Looks like I can't change the partition column by only dropping the target table.
What is the proper way to change partition columns in delta live tables?
Upvotes: 3
Views: 3017
Reputation: 87164
If you have changed the partitioning schema, then instead of starting pipeline using Start
button, you need to select "Full refresh" option from the dropdown of the Start
button:
Upvotes: 4