Eric
Eric

Reputation: 118

when the window function row_number in CTE generate the result in hive

I'm using row_number window function in hive CTE(with clause)

with data(
    select 1 a,1 b
union select 1,2
union select 1,3
union select 1,4
...
union select 1,26
),
data_with_row_num (
select a,b,row_number() over(partition by 1) as rn from data
)
select * from data_with_row_num
union all 
select * from data_with_row_num

I think the row_number should generate fixed id in the CTE block, I union the temp table(data_with_row_num) twice, and expect to get the same b with same rn. while I get the data such as

a b rn
1 4 1
1 2 1
1 9 2
1 3 2
...
1 19 26
1 24 26

it seems the same rn has different b , my question is why this happen .I think the rn should be generated in the with block, but it seems it is re-generated when it is used.

Upvotes: 0

Views: 29

Answers (1)

Eric
Eric

Reputation: 118

The reason is the CTE(Common Table Expressions) is just expression, which means it is just sth like a script, it will not materialize the result . I do the below query

with data(
    select 1 a,1 b
),
data_with_rn (
select a,b,rand(10)  from data
)
select * from data_with_rn
union all 
select * from data_with_rn

and the result is

a b EXPR$2
1 1 0.9514867562334854
1 1 0.9771762097973918

the random get the different result, if really need to materialize the temp result, cache table(in spark) might be a solution

cache table data select 1 a,1 b;
cache table data_with_rn select a,b,rand(10)  from data;
select * from data_with_rn
union all 
select * from data_with_rn

Upvotes: 0

Related Questions