Reputation: 76
I am using delta live tables for learning.
So, I created tables that look like this:
/* Silver Layer */
CREATE LIVE TABLE SSliverLive
(
RowID INT COMMENT 'Primary key',
OrderDate DATE,
ShipDate DATE,
OrderYear INT GENERATED ALWAYS AS (YEAR (OrderDate)),
ShipYear INT GENERATED ALWAYS AS (YEAR (ShipDate)),
)
USING DELTA
AS
SELECT *
FROM live.SBronzeLive;
/* Gold Layer */
CREATE LIVE TABLE SGoldLive
AS
SELECT rowid
, OrderYear
, COUNT(RowId) AS tt
FROM live.SSliverLive
GROUP BY rowid, orderyear;
But, the DLT pipelines throw the below error:
A generated column cannot use a non-existent column or another generated column.
I followed the SQLBits tutorial online and they can use the generated columns.
Removing the columns works fine but the Databricks website says I can use it.
Upvotes: 0
Views: 386
Reputation: 8170
The documentation states that generated column values are automatically generated based on a user-specified function over other columns in the Delta table.
In your gold table, you mention only OrderYear
, which is evaluated as YEAR(OrderDate) AS OrderYear
, but you don't have OrderDate
defined in the gold table, so it raises an error.
You can use OrderYear
only when you include OrderDate
in the SELECT statement, as shown below:
CREATE LIVE TABLE SGoldLive
AS
SELECT
OrderYear,
OrderDate
FROM live.SSliverLive
If you include OrderDate
in the SELECT statement, you will need to add it to the GROUP BY clause, which may not produce the desired results.
Therefore, you should perform a GROUP BY on OrderYear
without selecting it.
Here's an example using MONTH
:
Silver table:
CREATE LIVE TABLE SSliverLive
(
RowID LONG COMMENT 'Primary key',
OrderDate DATE,
orderMonth INT GENERATED ALWAYS AS (month(OrderDate))
)
USING DELTA
AS
SELECT
o_orderkey as RowID,
o_orderdate as OrderDate
FROM LIVE.SBronzeLive
Gold table:
CREATE LIVE TABLE SGoldLive
AS
SELECT
COUNT(RowId) AS tt
FROM live.SSliverLive
group by orderMonth
This produces output with a single column.
Alternatively, you can use the original column:
CREATE LIVE TABLE SGoldLive
AS
SELECT
month(OrderDate) as ordMonth,
COUNT(RowId) AS tt
FROM live.SSliverLive
group by month(OrderDate)
Outputs:
Upvotes: 1