Reputation: 2390
Given two tables:
filtered_locations
containing small set of data (only a few rows)
|-------------|
| loc<String> |
|-------------|
| ... |
|-------------|
table_clients
very large table (millions rows)
|--------------------------------------------|
| id | name| age |locations <array<String> |
|-----|--------------------------------------|
| | | | [a,b,c..] |
|--------------------------------------------|
I want to query table table_clients
for the value on filtered_locations
.
the main issue is that the field to query on the table_clients
is an array
type.
So, I exploded the column and then tried to embed a sub-query to include only the locations listed in filtered_locations
.
The first issue I faced is that Hive (at least the version I am running) seems not to accept a sub query insoide the in
or exists
statment.
That's the error I get:
Error while compiling statement: FAILED: SemanticException Invalid column reference 'location' in definition of SubQuery sq_1 [ tc.location in ( select fl.loc from filtered_locations fl ) ] used as sq_1
As an alternative I tried to use a LEFT JOIN
but also does not work because of the explode
call
Second error
Error while compiling statement: FAILED: SemanticException [Error 10085]: JOIN with a LATERAL VIEW is not supported 'location'
with filtered_locations as (
SELECT
'loc1' as loc
union all
'loc2' as loc
)
select
id, name, location
max(age) as max_age
from
table_clients tc
LATERAL VIEW EXPLODE(locations) l as location
-- Ideally this should work!
-- where
-- tc.location in (
-- select fl.loc from filtered_locations fl
-- )
left join filtered_locations fl
on fl.loc = tc.location
group by id, name, location
What is then the best solution for my problem? Note that the table_clients
has millions of records!
Thanks
Upvotes: -1
Views: 3236
Reputation: 44921
Theoretically, this should have worked
select *
from table_clients c
lateral view explode(location) e as loc
where e.loc in (select l.loc from filtered_locations l)
;
FAILED: SemanticException [Error 10009]: Line 6:8 Invalid table alias 'e'
... but since it does not, a little work-around is needed
select *
from (select *
from table_clients c
lateral view explode(location) e as loc
) c
where c.loc in (select l.loc from filtered_locations l)
;
Upvotes: 2