lydias
lydias

Reputation: 841

SemanticException[Error 10004] Invalid table alias or column reference

I'm trying to take the records by the first row in each group by id. the row number is generated by the order of id. I have been keeping getting the same error[10004].

This hive script is built in SAS studio environment and the following is a segment of proc sql code:

dataset id_f contains variable x,y and z

execute (create temporary table event_id_f1 as
        select *,ROW_NUMBER() OVER(ORDER BY id) as rownum
        from id_f
                        group by id
                        having rownum = min(rownum)
        )
by df20;

Error: Execute error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:129 Invalid table alias or column reference 'rownum': (possible column names are: x,y,z)

Appreciate any help!

@bernie suggested that Hive does not recognize alias. How should I nest the alias to make it work? The code is in an execute statement too. My attempt:

execute (create temporary table event_id_f1 as

        from (
             select *,
             ROW_NUMBER() OVER(ORDER BY id) as rownum
             from id_f
              group by id
              having rownum = min(rownum) ) ranked
             WHERE ranked=1
             )
by df20;

Upvotes: 0

Views: 4720

Answers (2)

damientseng
damientseng

Reputation: 553

My humble opinion is that the following SQL is problematic(taken from another answer):

select * 
from (
    select id, ROW_NUMBER() OVER (ORDER BY id) as rownum
    from id_f
    group by id 
) ranked
where ranked.rownum = 1

This query breaks into three phases:

First, select id from id_f group by id, which is equivalent to getting all the distinct id.

Then, a global windowing to order by these different ids and give each of them a row number.

Finally, the where ranked.rownum = 1 filter makes the result set contain only one row:

minimal_id, 1

If that's really all you want, use min should be much better.

This question didn't make much sense to me at first glance. I guess what you're looking for is to partition by id, and within each partition, order by something_else, and then choose the row ranked first within those distinct partitions. If luckily my guess is right, I'd suggest sth like this:

select 
    * 
from (
    select 
        *, ROW_NUMBER() OVER (partition by id ORDER BY STH_ELSE) as rownum
    from id_f
) ranked
where ranked.rownum = 1

Upvotes: 1

mechanical_meat
mechanical_meat

Reputation: 169274

This is the way I'd write it. Let me know if there's any error:

execute (create temporary table event_id_f1 as
    select * 
    from (
        select id, ROW_NUMBER() OVER (ORDER BY id) as rownum
        from id_f
        group by id 
    ) ranked
    where ranked.rownum = 1         
) by df20;

Upvotes: 2

Related Questions