Reputation: 118
I'm using row_number window function in hive CTE(with clause)
with data(
select 1 a,1 b
union select 1,2
union select 1,3
union select 1,4
...
union select 1,26
),
data_with_row_num (
select a,b,row_number() over(partition by 1) as rn from data
)
select * from data_with_row_num
union all
select * from data_with_row_num
I think the row_number should generate fixed id in the CTE block, I union the temp table(data_with_row_num) twice, and expect to get the same b with same rn. while I get the data such as
a b rn
1 4 1
1 2 1
1 9 2
1 3 2
...
1 19 26
1 24 26
it seems the same rn has different b , my question is why this happen .I think the rn should be generated in the with block, but it seems it is re-generated when it is used.
Upvotes: 0
Views: 29
Reputation: 118
The reason is the CTE(Common Table Expressions) is just expression, which means it is just sth like a script, it will not materialize the result . I do the below query
with data(
select 1 a,1 b
),
data_with_rn (
select a,b,rand(10) from data
)
select * from data_with_rn
union all
select * from data_with_rn
and the result is
a b EXPR$2
1 1 0.9514867562334854
1 1 0.9771762097973918
the random get the different result, if really need to materialize the temp result, cache table(in spark) might be a solution
cache table data select 1 a,1 b;
cache table data_with_rn select a,b,rand(10) from data;
select * from data_with_rn
union all
select * from data_with_rn
Upvotes: 0