Mateo
Mateo

Reputation: 1604

Is there a function equivalent to Hive's 'explode' function in Apache Impala?

Hive's function explode is documented here It is essentially a very practical function that generates many rows from a single one. Its basic version takes a column whose value is an array of values and produces a copy of the same row for each of those values.

I wonder whether such a thing exists in Impala. I haven't been able to find it in the documentation.

Upvotes: 2

Views: 7751

Answers (3)

xiaojie
xiaojie

Reputation: 21

Lateral views. In CDH 5.5 / Impala 2.3 and higher, Impala supports queries on complex types (STRUCT, ARRAY, or MAP), using join notation rather than the EXPLODE() keyword. See Complex Types (CDH 5.5 or higher only) for details about Impala support for complex types.

use another table B that unfold from array of values. and then use table a inner join table b.

See Impala Doc https://docs.cloudera.com/documentation/enterprise/5-9-x/topics/impala_langref_unsupported.html

Upvotes: 1

amanas
amanas

Reputation: 520

The very very tricky approach:

with A as (select 'row 1' as key, 'a;b;c' as value
           union all
           select 'row 2' as key, 'd;e'   as value
           union all
           select 'row 3' as key, 'f'     as value),
     B as (select *, length(value) - length(regexp_replace(value,';','')) + 1 as n from A),
     -- assuming you have at lest as many rows as different values in a single row
     C as (select row_number() over(order by key) as seq, n from B), 
     D as (select seq from C where seq <= (select max(n) from C))
select key, value, split_part(value,';',seq) as part
  from B
 cross join D
 where seq <= n
 order by key,seq

Important: notice the comment "assuming you have at least as many rows as different values in a single row".

Just run it and see the result.

Upvotes: 2

Impala does not have any function like EXPLODE in hive to read complex data types and generate multiple rows.

Currently through Impala, we can just read the complex data types in Hive generated tables using dot notation like select employee.empid from table1 .

Impala can query complex type columns only from Parquet tables or Parquet partitions within partitioned tables

Upvotes: 2

Related Questions