sharp
sharp

Reputation: 2158

Hive with multiple subqueries

I am trying to run multiple sub queries into a where clause, and I get the error below. Does mean that Hive doesn't support it? If not, is there a different way to write the query below?

Error occurred executing hive query: OK FAILED: SemanticException [Error 10249]: Line 14 Unsupported SubQuery Expression 'adh': Only 1 SubQuery expression is supported.

select
    first_name, 
    last_name,
    salary,
    title,
    department
from 
    employee_t1 emp
where 
    emp.salary <= 100000
    and (
        (emp.code in (select comp from history_t2 where code_hist <> 10))
        or 
        (emp.adh in (select comp from sector_t3 where code_hist <> 50))
    ) 
    and department = 'Pediatrics';

Upvotes: 5

Views: 22577

Answers (4)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

This is exactly what left semi join is for:

select
    distinct main.*
from
(
select
    emp.first_name, 
    emp.last_name,
    emp.salary,
    emp.title,
    emp.department
from 
    employee_t1 emp
left semi join 
        (select distinct comp from history_t2 where code_hist <> 10) emp_code on emp_code.comp=emp.code
where 
    emp.salary <= 100000 and emp.department = 'Pediatrics'
union all
select
    emp.first_name, 
    emp.last_name,
    emp.salary,
    emp.title,
    emp.department
from 
    employee_t1 emp
left semi join 
        (select distinct comp from sector_t3 where code_hist <> 50) emp_adh on emp_adh.comp=emp.adh
where 
    emp.salary <= 100000 and emp.department = 'Pediatrics'
) main

Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Upvotes: 0

Mustafa
Mustafa

Reputation: 21

Just adding a little note here. The error message states that hive supports only 1 subquery. That is in fact related to the limitation that hive has: "Only one subquery expression is supported for a single query".

You can refer to the official documentation here. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/hive-013-feature-subqueries-in-where-clauses.html

Upvotes: 0

Ram Manohar
Ram Manohar

Reputation: 1004

I agree with Gordon. using Joins you can try below query(not tested):

 select
    a.first_name, 
    a.last_name,
    a.salary,
    a.title,
    a.department
from 
    (Select * from employee_t1 where 
    emp.salary <= 100000
    and department = 'Pediatrics') a
left outer join (select comp from history_t2 where code_hist <> 10) b
on a.code = b.comp   
left outer join  (select comp from sector_t3 where code_hist <> 50) c
on a.adh = c.comp
where b.comp is not null
or    c.comp is not null
;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Two options. One is joins and the other is union all:

where emp.salary <= 100000 and
      emp.code in (select comp
                   from history_t2 
                   where code_hist <> 10
                   union all
                   select comp
                   from sector_t3
                   where code_hist <> 50
                  ) and
      emp.department = 'Pediatrics';

This isn't normally recommended because there are fewer options for optimization. But if Hive has this limitation (and I haven't tried this type of query in Hive), then this might be one way around it.

The join method would be most appropriate if the comp fields were unique in the two tables. Otherwise, you need to remove the duplicates to avoid duplication in the join.

Upvotes: 0

Related Questions