Pierre Galland
Pierre Galland

Reputation: 113

Hive : casting array<string> to array<int> in query

I have two tables :

create table a (
`1` array<string>);

create table b (
`1` array<int>);

and I want to put the table a in table b (table b is empty) :

insert into table b
select * from a;

when doing so I get the following error :

FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into
target table because column number/types are different 'b': Cannot
convert column 0 from array<string> to array<int>.

whereas I would not get this error if the fields were only of types string and int.

Is there a way to do the cast with arrays ?

Upvotes: 6

Views: 23365

Answers (3)

Harsha Vardhan
Harsha Vardhan

Reputation: 139

Brickhouse jar will do this a lot faster than casting them and collecting it back as a list . Add this jar to a hdfs location.
Use the link below to download the brick house jar

add jar hdfs://hadoop-/pathtojar/brickhouse-0.7.1.jar;   
create temporary function cast_array as 'brickhouse.udf.collect.CastArrayUDF';   
select cast_array(columns, 'int') AS columname from table;  
select cast_array(columns, 'string') AS columname from table

Upvotes: 3

achompas
achompas

Reputation: 130

Is there a way to do the cast with arrays ?

Not easily. You can manually cast the arrays if you know their size, but if not you might need to use structs. See my answer to this similar question.


Also: I cannot downvote the other answer, but it fails for nested selects with more than one array.

Instead of casting array elements and reconstructing the original arrays, it casts and then combines all elements into a single array. Example:

hive> select id, my_array from array_table limit 3;
OK
10023307    ["0.20296966","0.17753501","-0.03543373"]
100308007   ["0.16155224","0.1945944","0.09167781"]
100384207   ["0.025892768","0.023214806","-0.003712816"]

hive> select
    >     collect_list(cast(array_element as double)) int_array
    > from (
    >     select
    >         explode(my_array) array_element
    >     from (
    >         select
    >             my_array
    >         from array_table limit 3
    >     ) X
    > ) s;
OK
[0.20296966,0.17753501,-0.03543373,0.16155224,0.1945944,0.09167781,0.025892768,0.023214806,-0.003712816]

Upvotes: 0

leftjoin
leftjoin

Reputation: 38290

Re-assemble array using explode() and collect_list().

Initial String array example:

hive> select array('1','2','3') string_array;
OK
string_array
["1","2","3"]
Time taken: 1.109 seconds, Fetched: 1 row(s)

Convert array:

hive> select collect_list(cast(array_element as int)) int_array --cast and collect array
       from( select explode(string_array) array_element         --explode array
               from (select array('1','2','3') string_array     --initial array
                    )s 
           )s;

Result:

OK
int_array
[1,2,3]
Time taken: 44.668 seconds, Fetched: 1 row(s)

And if you want to add more columns in your insert+select query then use lateral view [outer]:

select col1, col2, collect_list(cast(array_element as int)) int_array
 from
(
select col1, col2 , array_element         
  from table
       lateral view outer explode(string_array) a as array_element         
)s
group by col1, col2
;

Upvotes: 3

Related Questions