pavan kumar reddy
pavan kumar reddy

Reputation: 193

HOW CTE (Common Table Expression) in HIVE gets evaluated

My question is around performance and the way a CTE gets evaluated in runtime.

I am planning to reuse code by defining a base projection and then defining multiple CTE's on top of this base projection with different filters.

Does that cause any performance issues.More specifically, does base projection will be evaluated every time.

For example:

WITH CTE_PERSON as (
   SELECT * FROM PersonTable
),


CTE_PERSON_WITH_AGE as (
   SELECT * FROM CTE_PERSON WHERE age > 24 
),

CTE_PERSON_WITH_AGE_AND_GENDER as (
  SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm'
),

CTE_PERSON_WITH_NAME as (
  SELECT * FROM CTE_PERSON WHERE name = 'abc'
)

Upvotes: 6

Views: 13177

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44941

A single scan.

Note:
- a single stage
- a single TableScan
- predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)


create table t (i int,j int,k int);

explain 
with    t1 as (select i,j,k from t  where i=1)
       ,t2 as (select i,j,k from t1 where j=2)
       ,t3 as (select i,j,k from t2 where k=3) 

select * from t3
;

Explain
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Filter Operator
            predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: 1 (type: int), 2 (type: int), 3 (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              ListSink

Upvotes: 8

Related Questions