Dhaval Patodiya
Dhaval Patodiya

Reputation: 76

Databricks delta live table throws error saying, "can't use generated columns"

I am using delta live tables for learning.

So, I created tables that look like this:

/* Silver Layer */

CREATE LIVE TABLE SSliverLive
(
  RowID         INT             COMMENT 'Primary key',  
  OrderDate     DATE,
  ShipDate      DATE,
  OrderYear     INT             GENERATED ALWAYS AS (YEAR (OrderDate)),
  ShipYear      INT             GENERATED ALWAYS AS (YEAR (ShipDate)),
)
USING DELTA
AS
SELECT *
FROM live.SBronzeLive;
/* Gold Layer */

CREATE LIVE TABLE SGoldLive
AS
SELECT rowid
       , OrderYear
       , COUNT(RowId) AS tt
FROM live.SSliverLive
GROUP BY rowid, orderyear;

But, the DLT pipelines throw the below error:

A generated column cannot use a non-existent column or another generated column.

I followed the SQLBits tutorial online and they can use the generated columns.

Removing the columns works fine but the Databricks website says I can use it.

Upvotes: 0

Views: 386

Answers (1)

JayashankarGS
JayashankarGS

Reputation: 8170

The documentation states that generated column values are automatically generated based on a user-specified function over other columns in the Delta table.

In your gold table, you mention only OrderYear, which is evaluated as YEAR(OrderDate) AS OrderYear, but you don't have OrderDate defined in the gold table, so it raises an error.

You can use OrderYear only when you include OrderDate in the SELECT statement, as shown below:

CREATE LIVE TABLE SGoldLive
 AS 
SELECT
OrderYear,
OrderDate
FROM live.SSliverLive

If you include OrderDate in the SELECT statement, you will need to add it to the GROUP BY clause, which may not produce the desired results.

Therefore, you should perform a GROUP BY on OrderYear without selecting it.

Here's an example using MONTH:

Silver table:

CREATE LIVE TABLE SSliverLive
(
  RowID         LONG             COMMENT 'Primary key',  
  OrderDate     DATE,
  orderMonth    INT              GENERATED ALWAYS AS (month(OrderDate))
)
USING DELTA
AS
SELECT
o_orderkey as RowID,
o_orderdate as OrderDate
FROM LIVE.SBronzeLive

Gold table:

CREATE LIVE TABLE SGoldLive
 AS 
SELECT
COUNT(RowId) AS tt
FROM live.SSliverLive
group by orderMonth

This produces output with a single column.

Alternatively, you can use the original column:

CREATE LIVE TABLE SGoldLive
 AS 
SELECT 
month(OrderDate) as ordMonth,
COUNT(RowId) AS tt
FROM live.SSliverLive
group by month(OrderDate)

Outputs:

enter image description here

Upvotes: 1

Related Questions