Reputation: 81
I am new to pentaho. Basically i have a requirement in hierarchy structured data where i select a sub set of hierarchy and i need to do count / sum on lower level nodes then go up level by level doing count + previous sum of count etc. This is all i know how to do sql using table variable (main point here is that i need to do this in memory and not do any updates in data table.)
DECLARE @ConsolidateMaster TABLE
(
-- my fields
)
INSERT @ConsolidateMaster
--select some sub set data and inset into table variable
UPDATE A
SET A.[DIR_REPORT_CNT] = 0,
A.[DIR_REPORT_COST] = 0,
A.[TOTAL_REPORT_CNT] = 0,
A.[TOTAL_REPORT_COST] = 0
FROM @ConsolidateMaster A
LEFT JOIN @ConsolidateMaster B
ON A.EMP_ID = B.SUPERVISOR_ID
WHERE B.EMP_ID IS NULL AND A.PROJ_CD = @PROJ_CD
--GO UP LEVEL BY LEVEL AND START COUNTING THE TOTAL + COST
DECLARE @MAX_LEVEL INT = (SELECT MAX(SUPERVISOR_LAYER) FROM @ConsolidateMaster WHERE PROJ_CD = @PROJ_CD)
WHILE (SELECT COUNT(1) FROM @ConsolidateMaster WHERE [DIR_REPORT_CNT] IS NULL) > 0
BEGIN
UPDATE
A
SET
A.DIR_REPORT_CNT = B.DIR_REPORT_CNT,
A.DIR_REPORT_COST = B.DIR_REPORT_COST,
A.TOTAL_REPORT_CNT = B.TOTAL_REPORT_CNT,
A.TOTAL_REPORT_COST = B.TOTAL_REPORT_COST
FROM @ConsolidateMaster A
INNER JOIN
(
SELECT
C.SUPERVISOR_ID,
COUNT(1) DIR_REPORT_CNT,
SUM(ISNULL(ANNUAL_COST,0)) DIR_REPORT_COST,
SUM(C.[TOTAL_REPORT_CNT]) + COUNT(1) TOTAL_REPORT_CNT,
SUM(C.TOTAL_REPORT_COST) + SUM(ISNULL(ANNUAL_COST,0)) TOTAL_REPORT_COST
FROM @ConsolidateMaster C
WHERE
C.SUPERVISOR_LAYER = @MAX_LEVEL
AND C.PROJ_CD = @PROJ_CD
GROUP BY
C.SUPERVISOR_ID
) B
ON A.EMP_ID = B.SUPERVISOR_ID
WHERE A.PROJ_CD = @PROJ_CD
AND A.SUPERVISOR_LAYER < @MAX_LEVEL
SET @MAX_LEVEL = @MAX_LEVEL - 1
END
After doing updates on the temp variable in the sql i would just select * from @ConsolidateMaster . How can i implement similar functionality in pentaho.
Upvotes: 0
Views: 537
Reputation: 1196
You shouldn't ask how to mimic a Table datatype in Kettle. You found a solution to your problem with T-SQL, I understand, but T-SQL and Kettle don't work at the same level of abstraction. Forget about your solution for a while and concentrate on your problem: "How to extract hierarchical data stored as an adjacency list in a database (SQL Server) and aggregate (count, sum) certain columns using a dataflow tool (Kettle)?".
There's a step named Closure Generator
that's able to calculate the transitive closure of your relation ConsolidateMaster
and emit tuples [supervisor_id, emp_id, distance]. Merge-Join
, Sort Rows
, and Group By
should produce the results you're looking for.
Upvotes: 1