Reputation: 2158
I am trying to run multiple sub queries into a where clause, and I get the error below. Does mean that Hive doesn't support it? If not, is there a different way to write the query below?
Error occurred executing hive query: OK FAILED: SemanticException [Error 10249]: Line 14 Unsupported SubQuery Expression 'adh': Only 1 SubQuery expression is supported.
select
first_name,
last_name,
salary,
title,
department
from
employee_t1 emp
where
emp.salary <= 100000
and (
(emp.code in (select comp from history_t2 where code_hist <> 10))
or
(emp.adh in (select comp from sector_t3 where code_hist <> 50))
)
and department = 'Pediatrics';
Upvotes: 5
Views: 22577
Reputation: 13387
This is exactly what left semi join
is for:
select
distinct main.*
from
(
select
emp.first_name,
emp.last_name,
emp.salary,
emp.title,
emp.department
from
employee_t1 emp
left semi join
(select distinct comp from history_t2 where code_hist <> 10) emp_code on emp_code.comp=emp.code
where
emp.salary <= 100000 and emp.department = 'Pediatrics'
union all
select
emp.first_name,
emp.last_name,
emp.salary,
emp.title,
emp.department
from
employee_t1 emp
left semi join
(select distinct comp from sector_t3 where code_hist <> 50) emp_adh on emp_adh.comp=emp.adh
where
emp.salary <= 100000 and emp.department = 'Pediatrics'
) main
Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Upvotes: 0
Reputation: 21
Just adding a little note here. The error message states that hive supports only 1 subquery. That is in fact related to the limitation that hive has: "Only one subquery expression is supported for a single query".
You can refer to the official documentation here. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/hive-013-feature-subqueries-in-where-clauses.html
Upvotes: 0
Reputation: 1004
I agree with Gordon. using Joins you can try below query(not tested):
select
a.first_name,
a.last_name,
a.salary,
a.title,
a.department
from
(Select * from employee_t1 where
emp.salary <= 100000
and department = 'Pediatrics') a
left outer join (select comp from history_t2 where code_hist <> 10) b
on a.code = b.comp
left outer join (select comp from sector_t3 where code_hist <> 50) c
on a.adh = c.comp
where b.comp is not null
or c.comp is not null
;
Upvotes: 1
Reputation: 1269773
Two options. One is join
s and the other is union all
:
where emp.salary <= 100000 and
emp.code in (select comp
from history_t2
where code_hist <> 10
union all
select comp
from sector_t3
where code_hist <> 50
) and
emp.department = 'Pediatrics';
This isn't normally recommended because there are fewer options for optimization. But if Hive has this limitation (and I haven't tried this type of query in Hive), then this might be one way around it.
The join
method would be most appropriate if the comp
fields were unique in the two tables. Otherwise, you need to remove the duplicates to avoid duplication in the join
.
Upvotes: 0