Reputation: 841
I'm trying to take the records by the first row in each group by id. the row number is generated by the order of id. I have been keeping getting the same error[10004].
This hive script is built in SAS studio environment and the following is a segment of proc sql code:
dataset id_f contains variable x,y and z
execute (create temporary table event_id_f1 as
select *,ROW_NUMBER() OVER(ORDER BY id) as rownum
from id_f
group by id
having rownum = min(rownum)
)
by df20;
Error: Execute error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:129 Invalid table alias or column reference 'rownum': (possible column names are: x,y,z)
Appreciate any help!
@bernie suggested that Hive does not recognize alias. How should I nest the alias to make it work? The code is in an execute statement too. My attempt:
execute (create temporary table event_id_f1 as
from (
select *,
ROW_NUMBER() OVER(ORDER BY id) as rownum
from id_f
group by id
having rownum = min(rownum) ) ranked
WHERE ranked=1
)
by df20;
Upvotes: 0
Views: 4720
Reputation: 553
My humble opinion is that the following SQL is problematic(taken from another answer):
select *
from (
select id, ROW_NUMBER() OVER (ORDER BY id) as rownum
from id_f
group by id
) ranked
where ranked.rownum = 1
This query breaks into three phases:
First, select id from id_f group by id
, which is equivalent to getting all the distinct id
.
Then, a global windowing to order by
these different ids and give each of them a row number.
Finally, the where ranked.rownum = 1
filter makes the result set contain only one row:
minimal_id, 1
If that's really all you want, use min
should be much better.
This question didn't make much sense to me at first glance. I guess what you're looking for is to partition by id
, and within each partition, order by something_else
, and then choose the row ranked first within those distinct partitions. If luckily my guess is right, I'd suggest sth like this:
select
*
from (
select
*, ROW_NUMBER() OVER (partition by id ORDER BY STH_ELSE) as rownum
from id_f
) ranked
where ranked.rownum = 1
Upvotes: 1
Reputation: 169274
This is the way I'd write it. Let me know if there's any error:
execute (create temporary table event_id_f1 as
select *
from (
select id, ROW_NUMBER() OVER (ORDER BY id) as rownum
from id_f
group by id
) ranked
where ranked.rownum = 1
) by df20;
Upvotes: 2