Alg_D
Alg_D

Reputation: 2390

Hive - Explode array column and use LEFT join or sub-query with select statement Error

Given two tables:

filtered_locations containing small set of data (only a few rows)

|-------------|
| loc<String> | 
|-------------|
|     ...     |
|-------------|   

table_clients very large table (millions rows)

 |--------------------------------------------|
 | id  | name|  age |locations <array<String> | 
 |-----|--------------------------------------|
 |     |     |      | [a,b,c..]               |
 |--------------------------------------------|

I want to query table table_clients for the value on filtered_locations. the main issue is that the field to query on the table_clients is an array type.

So, I exploded the column and then tried to embed a sub-query to include only the locations listed in filtered_locations.

The first issue I faced is that Hive (at least the version I am running) seems not to accept a sub query insoide the in or exists statment.

That's the error I get:

Error while compiling statement: FAILED: SemanticException Invalid column reference 'location' in definition of SubQuery sq_1 [ tc.location in ( select fl.loc from filtered_locations fl ) ] used as sq_1

As an alternative I tried to use a LEFT JOIN but also does not work because of the explode call Second error

Error while compiling statement: FAILED: SemanticException [Error 10085]: JOIN with a LATERAL VIEW is not supported 'location'

with filtered_locations as (
  SELECT
    'loc1' as loc
    union all
    'loc2' as loc
)

select 
  id, name, location
  max(age) as max_age 
from
  table_clients tc
  LATERAL VIEW EXPLODE(locations) l as location
-- Ideally this should work!
-- where
--  tc.location in (
--     select fl.loc from filtered_locations fl
--  )
left join filtered_locations fl
on fl.loc = tc.location

group by id, name, location

What is then the best solution for my problem? Note that the table_clients has millions of records!

Thanks

Upvotes: -1

Views: 3236

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

Theoretically, this should have worked

select  *

from    table_clients c
        lateral view explode(location) e as loc

where   e.loc in (select l.loc from filtered_locations l)
;

FAILED: SemanticException [Error 10009]: Line 6:8 Invalid table alias 'e'

... but since it does not, a little work-around is needed

select  *

from   (select  *

        from    table_clients c
                lateral view explode(location) e as loc
        ) c        

where   c.loc in (select l.loc from filtered_locations l)
;

Upvotes: 2

Related Questions